Hi everyone,
I came across an error happening when adding columns from values, that were reconciled with wikimedia commons.
I am using OpenRefine Version 3.9.5 on Windows 11.
steps for reproduction:
- Reconcile a column with commons-files via endpoint
https://commonsreconcile.toolforge.org/en/api. - choose "add columns from reconciled data" and choose a property where you know, that some entities have a missing English label.
expectet results:
- OpenRefine adds a column with the reconciled wikidata-values of this property.
- Where no label of the used endpoint-language is given, it shows the Q-ID.
observed results:
Reconciliation is not possible due to error-message in preview-window.
Screenshot:
)
error message of the log:
02:58:25.279 [ refine] POST /command/core/preview-extend-data (12960ms)
02:58:26.827 [ command] Exception caught (1548ms)
java.io.IOException: HTTP error 400 : BAD REQUEST for URL /en/api
at com.google.refine.util.HttpClient.postNameValue(HttpClient.java:264)
at com.google.refine.model.recon.ReconciledDataExtensionJob.postExtendQuery(ReconciledDataExtensionJob.java:210)
at com.google.refine.model.recon.ReconciledDataExtensionJob.extend(ReconciledDataExtensionJob.java:177)
at com.google.refine.commands.recon.PreviewExtendDataCommand.doPost(PreviewExtendDataCommand.java:144)
at com.google.refine.RefineServlet.service(RefineServlet.java:187)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:750)
at org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1410)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:764)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:529)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1570)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:790)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1384)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:484)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1543)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1306)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at com.google.refine.ValidateHostHandler.handle(ValidateHostHandler.java:93)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at org.eclipse.jetty.server.Server.handle(Server.java:563)
at org.eclipse.jetty.server.HttpChannel$RequestDispatchable.dispatch(HttpChannel.java:1598)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:753)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:501)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:282)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
But sometimes the reconciliation seems to work. In this case, some files are omitted at the "add columns from data"-operation, while the data is added for others. Although several files were skipped, the reconciliation result shows "100 % matched".
Workaround:
After adding English labels at the relevant wikidata-records where they were missing, the operation works as expected.
So it seems, that this behavior is dependent from the used reconciliation-endpoint https://commonsreconcile.toolforge.org/en/api. In short: the error occurs, when the label of the language of the used endpoint is missing.
For example: Given is a wikicommons-file, that has a P180-item only with an German label. Adding data from the reconciliation will work, when the reconciliation used the German endpoint https://commonsreconcile.toolforge.org/de/api, but with the Englisch or other ones it will produce the error. A default label for all languages at the wikidata-record does not solve the problem.
I hope, the description is helpful.
Best regards
Claus
