Situation: I am reconciling data with the reconciliation service in Open Refine based on a column called GND-IDs. For the reconciliation service to work, it needs to identify the GND-ID of an entry.
Problem: the reconciliation service was able to find the GND-IDs when the query was made last year, but when reconciling it again this year for the same GND-IDS, it wasn’t able to identify some of the same GND-IDs.
Why is that? Based on this finding, it seems that the reconciliation service is quite unreliable.
I would appreciate any help in this regard.
Based on my knowledge of the GND this should not be the case as GND-IDs are not deleted but only redirected to other GND-IDs in case of entity duplicates. Can you provide examples?
Besides that: The GND reconciliation service is provided by hbz — Hochschulbibliothekszentrum des Landes NRW so you could insert the IDs in search box on their website lobid-gnd to check if the service is working correctly.
For reproducibility it would be nice to have an example of the GND-IDs that are supposedly no longer available.
I sometimes also stumble upon GND-IDs in our data that no longer seem to exists in the GND at all.
I usually assumed that this have been copy and paste errors, but having some more examples it would be worth to investigate based on older dumps of the GND whether some IDs where removed instead of redirected.
Another common error is to forget to remove whitespace in the ID column, as this also results in problems resolving the entity using the reconciliation API.
I solved the problem after realizing the datatypes were mixed. Some values were numeric, some were texts. When I converted all numeric GND-IDs to text, I was able to run the reconciliation service seamlessly.
To convert all values to string, do the following: go to “Add Column based on Column” then name the new column “GND-ID-Text”, then insert the following formula into the GREPL text box: if(value.type() == “number”, value.toString(“%.0f”), value). Now, all GND-IDs are strings and can be reconciled with the reconciliation service for a maximum of matches.
Oh yes, handling GND-IDs as numbers often presents problems… as sometimes GND-IDs like
4-12349 are evaluated to