Exact matching on a property during reconcile

I am trying to reconcile an export of a database (all dutch mills in milldatabase.org) and match that against a property in wikidata. For dutch mills this database contains the 'ten-bruggencatenummer (tbc) in the external_number on which I want to reconcile, so I reconcile the name column, and in the next screen put Ten-bruggencatenummer against external number. However I noticed that I got matches based on an exact name, that have a different tbc. Is there any way to enforce an exact match (apart from case differences) during reconcile?

A different approach might be to pull the tbc in a new column from the reconciled column, and then match the two columns where they are different, but I didn't find an option for that either.

Can anybody guide me in the right direction? The aim in the end is to add the id of milldatabase to wikidata as well where there is a match on tbc.

The scoring and filtering of results differs between Reconciliation Services.

With the Wikidata Reconciliation Service additional columns are influencing the scoring, but only filters items that are below a certain threshold.

This means that an imperfect name match and a perfect tbc match results in a lower score. But the same is true for a perfect name match and an imperfect tbc match.

Mitigation strategies:

  • You could first reconcile all the mills that already have a tbc in Wikidata by reconciling against the column with the tbc and then filter only for perfect matches.
  • You could add multiple tbc columns and use them for reconciliation to increase the impact on the score a wrong tbc has.
  • You could use the "best candidate's name edit distance"-Facet (Reconciling | OpenRefine) to filter results with perfect name matches, but imperfect scores (because of tbc mismatch)
  • After reconciliation you could load the tbc data from Wikidata as described in the docs (Reconciling | OpenRefine). After that you can compare two columns by adding a custom text facet (Exploring facets | OpenRefine) to compare the two tbc columns like cells["tbc_wikidata"].value == cells["tbc"].value

Currently there is no go to solution as the best strategy depends on the data you have and the data that already is in Wikidata. But the ideas above may give you some starting points to try and extend.

Thanks. I was already considering your fourth option, that I think makes most sense with this data. I already started it, but didn't know how to compare, so your instructions definitely helped. Only I added toTitlecase() to both sides of the expression to compare case insensitive, which was needed.

1 Like