Resolve using https://hub.toolforge.org/

If a column has the values of a Wikidata property I have used hub.toolforge.org to resolve

Examples
1: Column with value same as WD P8433

  1. Hubtools --> https://hub.toolforge.org/P8433:G8095?format=json --> JSON
{
    "origin": {
         "id": "P8433:G8095",
         "value": "G8095",
         "properties": [
              "P8433"
          ],
         "qid": "Q98490206"
          },
         "destination": {
         "langs": [
         "en"
],
    "sites": [
         "wiki"
     ],
     "bestFallbackSitelink": {
     "site": "wikidatawiki",
     "title": "Q98490206",
      "score": 0.5
   },
   "url": "https://www.wikidata.org/wiki/Q98490206"
   }
}
  1. Add column by fetching URL

  1. Parse the Qnumber from fetched JSON

value.parseJson()['origin']["qid"]

Thanks for suggesting this. it has also helped me!

I did the following:

  • Start with a column of bare identifiers, in my case for the Wikidata P3638 property
  • Edit column > Add column by fetching URLs
  • Enter "https://hub.toolforge.org/P3638:" + value + "?format=json" to construct the URL, enter a column name, run and wait
  • Produces a new column with json, and then extract Q-ids using the following GREL: value.parseJson()['origin']["qid"] (note that your post above made the quotes curly, which doesn’t work)
  • Reconcile Qids via Reconcile > Use values as identifiers; choose the Wikidata reconciliation service.

If someone knows how to do this with less steps (omitting the json generating and parsing would be so awesome)?

1 Like

Maybe it should be a feature request → add menu option for a column to reconciliate values to Wikidata objects that has a specific property with those values…

As we today have option use Use values as identifiers…

Maybe it should be a feature request → add menu option for a column to reconciliate values to Wikidata objects that has a specific property with those values…

That's exactly how identifier/key-based "reconciliation" used to work with Freebase, although it's not really a reconciliation operation since it can be done with a single direct lookup. A QID is just a specific instance of a strong identifier. OpenRefine should be able to do lookups against any unique strong identifier.

Tom

2 Likes

It is also possible to do it with the Wikidata reconciliation service, but the feature is very hidden: you need to first create a column of names for your entities (if you do not have names, you can input some arbitrary string as a placeholder) and then reconcile that column, with the other column of identifiers configured to match the identifier property.

There are some open issues about making that easier, such as:

1 Like

@antonin_d Do you think we should document that feature under the Wikidata extension docs? Seems like it’s begging for a new section heading perhaps called “Reconciling objects no matter their name” or something more appropriate on this page Reconciling with Wikibase | OpenRefine ?

Yes that would make sense.

@antonin_d Great, if you could quickly just type up a paragraph, then I can review and add more to it. The reason I’m asking you first, is that I would do it myself, but I am struggling a bit to actually understand the workflow steps you mentioned above. Perhaps it’s just me being blind still to the “no name entities” problem itself. Or maybe it should be called “missing name entities”. Dunno.

1 Like

One feature I have been thinking of is the ability to “Add column by fetching destination URL”(final URL in a redirect chain). While this type of use case might not be its primary use case I could see it being useful here too.

The main use case I imagine would be to just update redirected URLs.

One feature I have been thinking of is the ability to “Add column by fetching destination URL”(final URL in a redirect chain). While this type of use case might not be its primary use case I could see it being useful here too.

The main use case I imagine would be to just update redirected URLs.

So you want the resolved destination URL, not its contents? Please create a feature request in the issue tracker. It seems like perhaps this could be generalized to return all headers, not just the Location: header, maybe using the HEAD verb instead of GET to make it more lightweight.

Tom

@abbe98 There are several examples of how to get Redirect info using Python
Github search with ‘Python and HTTPRedirectHandler’

Here’s just one snippet I hacked up and used in Expression editor with Jython that is semi-working, but I leave it as an exercise to get it working fully :slight_smile:

import urllib2

class RedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):

        result = urllib2.HTTPError(req.get_full_url(), code, msg, headers, fp)
        result.status = code
        return result
    http_error_301 = http_error_303 = http_error_307 = http_error_302

opener = urllib2.build_opener(RedirectHandler())
webpage = opener.open('https://openrefine.org')

return opener

@thadguidry yeah I have such a solution but I could imagine the use-case being common enough to be in core.

1 Like