Error when adding column from reconciled wikimedia-commons-values

Hi everyone,

I came across an error happening when adding columns from values, that were reconciled with wikimedia commons.
I am using OpenRefine Version 3.9.5 on Windows 11.

steps for reproduction:

  • Reconcile a column with commons-files via endpoint https://commonsreconcile.toolforge.org/en/api.
  • choose "add columns from reconciled data" and choose a property where you know, that some entities have a missing English label.

expectet results:

  • OpenRefine adds a column with the reconciled wikidata-values of this property.
  • Where no label of the used endpoint-language is given, it shows the Q-ID.

observed results:

Reconciliation is not possible due to error-message in preview-window.
Screenshot:


)
error message of the log:

02:58:25.279 [                   refine] POST /command/core/preview-extend-data (12960ms)
02:58:26.827 [                  command] Exception caught (1548ms)
java.io.IOException: HTTP error 400 : BAD REQUEST for URL /en/api
        at com.google.refine.util.HttpClient.postNameValue(HttpClient.java:264)
        at com.google.refine.model.recon.ReconciledDataExtensionJob.postExtendQuery(ReconciledDataExtensionJob.java:210)
        at com.google.refine.model.recon.ReconciledDataExtensionJob.extend(ReconciledDataExtensionJob.java:177)
        at com.google.refine.commands.recon.PreviewExtendDataCommand.doPost(PreviewExtendDataCommand.java:144)
        at com.google.refine.RefineServlet.service(RefineServlet.java:187)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:750)
        at org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1410)
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:764)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:529)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131)
        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1570)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
        at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:790)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1384)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:484)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1543)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1306)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
        at com.google.refine.ValidateHostHandler.handle(ValidateHostHandler.java:93)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
        at org.eclipse.jetty.server.Server.handle(Server.java:563)
        at org.eclipse.jetty.server.HttpChannel$RequestDispatchable.dispatch(HttpChannel.java:1598)
        at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:753)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:501)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:282)
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
        at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)

But sometimes the reconciliation seems to work. In this case, some files are omitted at the "add columns from data"-operation, while the data is added for others. Although several files were skipped, the reconciliation result shows "100 % matched".

Workaround:

After adding English labels at the relevant wikidata-records where they were missing, the operation works as expected.

So it seems, that this behavior is dependent from the used reconciliation-endpoint https://commonsreconcile.toolforge.org/en/api. In short: the error occurs, when the label of the language of the used endpoint is missing.

For example: Given is a wikicommons-file, that has a P180-item only with an German label. Adding data from the reconciliation will work, when the reconciliation used the German endpoint https://commonsreconcile.toolforge.org/de/api, but with the Englisch or other ones it will produce the error. A default label for all languages at the wikidata-record does not solve the problem.

I hope, the description is helpful.

Best regards
Claus