Moving forward with our data upload to a Wikibase, we hit a little snag. We have a file with 2500+ lines, most of which had been uploaded before. In order to cleanly import updated data, we deleted all relevant entries from the Wikibase. This worked easily.
We import the data into OpenRefine, use the previously-created schema, make sure relevant data is reconciled, and try to upload. There are issues, including invalid date formats (years that are 0) and qualifiers that are ignored (because some cells are blank), but nothing that should come in the way of the upload. We would expect all the data to upload properly, as it had in the past.
However, out of the 21 items that should receive data (this ties to this issue, where we mention that the low number of "changes" indicated can be confusing, because it only counts the items updated, while actually many statements are created), 3 do not receive updates. Note that, from the data file, it seems like 22 items should be affected.
So far as we can see, the three in question are not linked to the issues mentioned: both the empty cells and the years set to 0 affected many more than just these three items. Trying to facet one of the items that did not upload to carry out this upload specifically does not work. The "preview" tab shows edits the right way, and no information is provided in "Wikibase editing results" after our attempts to upload the edits.
Not really sure what to make of it as this point. Any ideas?
I assume you have already checked the "Wikibase editing results" column and it does not show any error?
Have you also looked in the server logs? It could be that certain types of errors are still only reported there.
Have you also tried to re-upload only the edits on items that were skipped?
Could there be any data on the item that is considered duplicate of the data that you are uploading, explaining the skipped edits?
Hi @antonin_d,
Thanks for following up, as always!
Yes, as written we have indeed checked that column And we also tried to facet one of the ones that did not upload to try and upload that one specifically and that did not work. And the data should not be duplicate since we deleted all similar statements before.
The one thing you mention that we have not done is check the server logs and there I would not know where to start. Since our wikibase is hosted by Wikibase.cloud, I am not sure we can do this; can we?
Best,
Mike
Edit: we figured out the "21 edits instead of 22" and it's just a data thing, so not related to OpenRefine. Additionally, though, we note that the three items that we not updated are the ones that have the largest number of lines (from 379 to 555, while the largest number for an item that was actually updated is 274).
By server logs, I mean the OpenRefine server logs. On Windows and Linux, this is the text that appears in the terminal when you launch OpenRefine. On MacOS it is not shown by default but you can run OpenRefine from a terminal to be able to see them:
The plot thickens, very interesting! So it looks like Wikibase is returning some sort of error, likely in HTML or XML if the response starts with <. That sounds rather unexpected, as error messages are normally returned in a JSON format.
I guess the next step would be to try and figure out what this response is. We could make a custom version of OpenRefine that dumps the entire response to the logs in such a case. It's a bit of effort but I could do it if you are motivated to keep investigating further.
@antonin_d We should probably layer in a Level.DEBUG flag option available for Wikidata edit responses to be logged fully? That way, anyone in the future (user or developer) can flip the logging flag for Wikibase edits to DEBUG and get lots of verbose logs for troubleshooting, while the default can just be Level.INFO.
Yes, ideally something like this should be debugged by changing the logging level, however in this particular case, the Wikidata-Toolkit library itself would need to be changed to report the full text of the response. Although that's something we can technically do, I wouldn't say it's really desirable because it could be a security issue and also logging big amounts of text has its costs.
@epfo I have made custom snapshots with this additional logging:
Ok! Then this is likely something specific to the Wikibase.Cloud platform. There is probably a limit on the size of the bodies of POST requests, somewhere between the HTTP server and Wikibase. One could request WMDE to bump this limit so that such edits go through.
As a workaround, you could try uploading those edits via QuickStatements, which doesn't make such big edits when editing existing entities.
I see. Maybe that's a new thing, because I have already uploaded this data. Only later did I delete it to re-upload it with extra details. Anyway, I can try and facet it on years to break it down. Let's see if that works.
Faceting by year proved a workaround; I split the batch in two and that brought the number of edits per item below the unknown threshold, allowing the edits to go through for all three items at a time.