Hello,
I am using the EDTF format for the dates stored in my own Wikibase and I just noticed 2 problems with the OpenRefine reconciliation service. I would like to know if there are any solutions or if I should resort to using the classic "dates" datatype..
Versions: until this morning I was using OpenRefine 3.7.2 and now the version 3.8.1 without improvements.
-
The first problem is that when I want to reconcile a column of personal names with my Wikibase items and use the birth date column to improve the results by comparing with the EDTF birth date property of my Wikibase, the reconciliation score doesn't take into account the EDTF dates values and doesn't reflect well the reality.
For example: I can have a score of 100% when reconciling only the label column "Jean Dupont" with the "Jean Dupont" item of my Wikibase. If I'm using an extra column with the birth date to improve the result, even if they have the same (EDTF) birth date, the score will then be 70%. it's a big gap not to be able to use these EDTF properties to facilitate the deduplication process.
-
The second problem is related: when I want to Add columns from reconciled values, it works except for the EDTF Date/Time properties. The cells stay empty even if some values do exist. Again, this represents a serious loss of information.
Does anyone have any experiences with this/advice to give?
Thank you!
Anne
Is your birth date column in OpenRefine a string value or a date value when you do the reconciliation? I don't have a local wikibase, but against Wikidata I see that I can only use a date column effecitvely if it is string (which does seem counter-intuitive).
e.g. if I have a table
Name |
DoB |
Jean Dupont |
1938-05-14 |
Where the DoB is just text, and reconcile against Wikidata using the DoB column as secondary information it gets a match with a score of 100 against the single expected Wikidata entry (Jean Dupont - Wikidata). However, if I do a "toDate()" transformation on the DoB first, and then try the reconciliation in the same way, I get a match to each of the existing Jean Dupont entries in Wikidata, each with a score around 70 exactly as you describe.
I don't see the second issue you describe with Wikidata, and I don't know enough about running a local wikibase to know if this could be something about your wikibase instance or some other problem - sorry
1 Like
Thank you for your reply and for having tested it.
Unfortunately, my DoB column is already is a string value. I think this really has to do with the new EDTF date/time datatype (cf. https://www.mediawiki.org/wiki/Extension:Wikibase_EDTF.
You are right, the reconciliation service does not treat EDTF values as dates but rather as bare strings. So comparison between EDTF values will be with basic string fuzzy-matching, which is likely to give rather bad results.
The current reconciliation service is currently not actively developed but you could request this feature here:
1 Like
Thank you for your reply, Antonin.
However, it doesn't seem to explain why I can't manage to add this content in a new column ("Add columns from reconciled values"). Do you know the reason/is this normal behavior?
That's curious indeed, I would have expected fetching via "add columns from reconciled values" to work.