Reconciling properties

Hi everyone,
I have a table of audit data with the following columns:

  1. Entity
  2. Financial year
  3. Audit field
  4. Amount

And I have over 8,000 lines of this. For now, I have reconciled column one, as I have entities as items.

What I want to do is reconcile the "audit field" to use it as a statement (all audit field values already exist as properties in the wikibase), but I apparently cannot use that column in my schema. Is there a way to do this?

I see two alternatives that are not appealing:

  • transpose my table to have audit fields as columns
  • use facets to filter my data and manually enter the audit field as statement in the schema [edit: I was going to do this, but it turns out that every time I change the statement in the schema, all values, qualifiers, and references disappear...]

Since I have around 100 different audit fields, these are not easy options to implement.

Any ideas?

Thanks!

Hi @epfo,

That's a feature request I can relate to pretty well. I'm not aware of any way to do this in OpenRefine as things stand.

The first step would obviously be to use a reconciliation service that can reconcile properties. The default Wikidata service cannot and I don't know any that can as of today, but it would likely not be very hard to build.

Assuming we have that, it doesn't solve your problem entirely though. You'd need to be able to drag and drop reconciled properties in a schema, and that's not supported yet. It's not so clear to me how this could be built, because when adding a statement (or qualifier, or reference part) we need to know the property's datatype to be able to display an appropriate field for the value.
In general, your column of reconciled properties could potentially contain properties with different datatypes! I can think of a few approaches:

  • build the reconciliation service so that you can reconcile to properties of a specific datatype, ensuring all reconciled values are of a uniform datatype, which can then be recognized by the schema editor to offer the appropriate input field for the value.
  • somehow let the user select themselves which datatype should be used in the schema editor. Find a way to report errors appropriately when the datatype of the property and of the value don't match.
  • accept mixed datatypes in the column of reconciled property, and offer a generic input field for the value, which intelligently parses whatever is input there to match the datatype of the property. It's a bit cumbersome to build and the UX might be a bit confusing because the user needs to know how to format the values so that they are understood correctly. For datatypes like quantities, where we currently display an amount and a unit field, we'd need to offer a different syntax to specify the amount and unit in a single text field, perhaps. And good error reporting would also be needed.

May I ask if in your case, your 100 different audit fields have all the same datatype?

Thanks for the detailed reply, this is appreciated!

To be clear, when you mention the various approaches, this isn't meant as guidance for me, right? Because I do not have the first clue about doing any of this. Just checking.

And, yes, all the audit fields have the same datatype, they're just amounts of money, so quantity (with a currency is better, but it can even be without).

Side question, which may be off-topic here, but: is the proposed solution too out-of-reach (in the sense that you say that it could be built, but that does imply that it would take quite some time, no?) and this would be better handled via QuickStatements? (albeit at the cost of a crazy amount of time...)

yes, those would be approaches for developers to make changes to OpenRefine so that a future version would work in your use case.

this would be better handled via QuickStatements? (albeit at the cost of a crazy amount of time...)

That can likely be done via QS, maybe OpenRefine can still be of some help for that, but yes it's probably not straightforward.

Thanks a lot. Given the we would be happy to move forward, I guess we'll have to downgrade to QuickStatements, hoping that it works.

What's the best way to encourage the development of these functionalities in OpenRefine?

If you haven't filled it yet, our user survey is currently running, where you could voice such a need.

If you have a GitHub account, you could open an issue about support for variable properties in schemas, as I don't think we have one yet. I could only find one about adding support for creating properties via OpenRefine, which is related to the extent that it would also require a reconciliation service for properties.

Thanks! Actually, I already filled in the survey. But I can ask a colleague to create the Github issue, yes.