Is there a way to tell Open Refine just to reconciliate to a subset of Wikidata e.g. people from the Swedish Parlament
VALUES ?member {
wd:Q33071890
wd:Q81531912
wd:Q82697153
wd:Q10655178
}
?person wdt:P39 ?member;
Is there a way to tell Open Refine just to reconciliate to a subset of Wikidata e.g. people from the Swedish Parlament
VALUES ?member {
wd:Q33071890
wd:Q81531912
wd:Q82697153
wd:Q10655178
}
?person wdt:P39 ?member;
I think this question is quite interesting as I was wondering how to do that myself but did not take the time to find out how. You need to use some "hidden" features of OpenRefine and the scoring algorithm from Wikidata for that. I added the links where these are described at the end.
This is how I got it working in OpenRefine 3.6:
Note that you need the names and not the QIDs.
Note that you need to use the record mode, so that the different names will basically get connected via a logical OR. If you'd need a logical AND you would need to use a different column for each name.
Reconcile against Wikidata using the column with the different names of the swedish parliament as additional property with P39 as property type.
Members of the listed parliaments should receive higher scores than other people in the wikipedia.
Here are some sources with background information:
Thanks I will test it
I feel as WD get bigger this ”subset” functionalty is getting more important. The interesting thing with members of the Swedish Parliament is that early 1900 we got more people who were part of the social democrats and they mostly had “common” names based on patronymicons like “the son of Anders” = Andersson —> the “standard” Open Refine reconciliation is a mess and the new Wikicommons add-on were you can’t preview uploaded pictures don’t help you
See feature request of preview —> File preview before uploading to Wikicommons · Issue #5594 · OpenRefine/OpenRefine · GitHub
If you want to reconcile against a small enough subset of Wikidata, you could consider writing a SPARQL query defining the set of such entities, download the results and put it in csv-reconcile or reconcile-csv, which both let you run a reconciliation endpoint off a CSV table. That would ensure you get matches from this subset only.
The default Wikidata recon service is pretty useless when it comes to reconciling people, since the type system cannot be used to filter by occupation. Perhaps the “occupation” property could be treated just like “instance of” by the reconciliation system, so that one could filter more accurately. It would be a fairly easy change to make.
There is also a dedicated recon service for people, run by Ontotext, that you can add with the URL “https://reconcile.ontotext.com/people”.
I'm not sure that I agree with this assessment. I successfully reconcile people with Wikidata all the time using the default reconciliation service. I tend to address the above problem by adding columns for occupations / countries of citizenship, reconciling these first, and then adding these as additional properties to be taken into account while reconciling. Works fairly OK for me most of the time, although of course it's not perfect.
A main hurdle to more efficiently work with the WD recon service is IMO really UI related: one needs to use the mouse pretty awkwardly and hover and click a lot to properly disambiguate people (or generally entities) with similar names and to identify and select the right one.
I've tried the Ontotext recon service for people and I had bad experiences with it:
Off topic: the lobid-gnd API for OpenRefine allows to use parts of the query string syntax from ElasticSearch.
In the context of searching for people having several columns with information (birthdate, occupation, …) this is realy awesome. You can also declare date ranges or perform fuzzy matching… so far this gave me the best user experience when performing reconciliation tasks.
I wrote a (german) tutorial on how to use the lobid-gnd API with OpenRefine last year.
Interesting discussion altogether, and interesting workarounds.
I would like to point our one additional limitation of the reconciliation service, which is related to these. The reconciliation type only considers P31, not P279. Therefore, it is impossible to find an obvious match for a subtype of something.
Another comment: I agree that the reconciliation UI cannot tackle to the breadth of data on Wikidata and needs severe overhaul. It would be interesting to know what the chances are for creating custom interfaces, would they need to live in a completely separate environment and recreate all the other existing functionality, or would there be ways to create UI extensions to OR?
| Susanna_Anas
December 22 |
- | - |
I would like to point our one additional limitation of the reconciliation service, which is related to these. The reconciliation type only considers P31, not P279. Therefore, it is impossible to find an obvious match for a subtype of something.
The Wikidata reconciliation service seems to have code to do this. Does it not work in practice? Do you have an example which could be used to investigate?
Tom
Maybe we should make a separate thread of this. It is possible I don't know how to make use of the code, but in general, when a type is chosen, instances of that type are presented, not subclasses. I will create an example soon.