Wikidata reconciliation: fetch references through 'add columns from reconciled values'

I'm looking to fetch the 'reference' data from reconciled Wikidata items. For example, there is a person ( Q26250609) with a date of birth and that date of birth has a reference. How do I fetch the reference value through 'add columns from reconciled values'? Or is there another way?

I can't find any documentation on this, thanks!

Hi @Pai_Dekkers Not sure whether it is the best way, but I guess it should be possible with a SPARQL query. Maybe that's a better option than doing it directly in OpenRefine?

Hi @Pai_Dekkers, I think this is not possible using the Reconcilition API and its features. Alternatively, you can fetch the Wikidata Action API response ("Add value by fetching URLs...") for that item and parse the value you are interested in. Here is the link for your item: https://www.wikidata.org/w/api.php?action=wbgetentities&props=claims&ids=Q26250609&format=json, the path for your value is "entities►Q26250609►claims►P569►0►references►0►snaks►P5370►0►datavalue►value".

First fetch the data for your Q-Numbers in column "wikidata_id" from wikidata as JSON using 'https://www.wikidata.org/w/api.php?action=wbgetentities&props=claims&ids=' + cells['wikidata_id'].value + '&format=json'. Then you should be able to extract the value from the new column with value.parseJson()['entities'][cells['wikidata_id'].value]['claims']['P569'][0]['references'][0]['snaks']['P5370'][0]['datavalue']['value'].

Best
Michael

1 Like

Hi Michael,

Thank you for your response. The expression value.parseJson()['entities'][cells['wikidata_id'].value]['claims']['P569'][0]['references'][0]['snaks']['P5370'][0]['datavalue']['value'] returns the '25411' value, do you know how to get the name of the source i.e. "Entomologists of the World"/"P5370"?

And where do I start if I want to learn how to construct an expression like the one you provided? I see how it works but would have no idea how to create one for a different value in the Json data.

Thanks!

Hi, sure, my answer was to specific to be helpful in this regard.
The expression is a path to the value in the deeply nested wikidata json which you see when you click on the link provided in my answer. First I looked at the wikidata page of Q26250609 to find the related wikidata properties. Date of birth is wikidata property P569 and Entomologists of the World ID is P5370 - you see the wikidata ids when hovering the respective name on the wikidata page:

I searched for P569 in the JSON, then for P5370 within the P569 part, and then for the value ("25411") in question. The value.parseJson() expression does the same, but contains every 'level' of hierarchy starting from "entities". Most 'levels' have a name (like "P569" or "snaks"), while a [0] tells the script to look for the first item in a list.

Here how it looks if I copy the JSON to https://jsonpathfinder.com/ and go to the value you want. At the top is the full path but with a different formatting compared to the GREL expression:

I found this tutorial helpful at the beginning: Fetching and Parsing Data from the Web with OpenRefine | Programming Historian

Regarding your question on the source label: The source is already part of the path I used (Entomologists of the World ID - Wikidata) as you asked for the value of this particular property. Another source would have another property ID hence another path to the value. Here it gets complicated as you would need a more complex expression (most likely a Python script) to get all possible source property IDs for birth dates in a new column in OpenRefine. Afterwards you would need to fetch the labels for the different property ids using the Action API again (e.g., https://www.wikidata.org/w/api.php?action=wbgetentities&props=labels&ids=P5370&format=json).

2 Likes