Sometimes OpenRefine 'forgets' to add structured data to Wikimedia Commons and I do not know why

Hi everyone,

I am using OpenRefine (first-time user) to upload a bunch of photos to Wikimedia Commons. I have a fairly basic google docs-spreadsheet with a few jpg files and the corresponding data needed to upload to Wikimedia Commons using OpenRefine.

Most of the times, it works just fine, but then, every now and then, a photo is uploaded, but most of the structured data I added in my google sheets is missing. I add things like coordinates, "depicts", author, source, description. It's all missing, but the infobox specifying the license does in fact show up. See an example here: File:EF010 Effretikon MaxVogt 20240807 151918.jpg - Wikimedia Commons

What I do not understand is that other files, from the same upload batch from the same google sheet, are uploaded with all the structured data I provided, see this file (I am told I can only provide two URLs per post as a new user, so here is just a screenshot instead):

I use one google sheet for the upload, and reconciliation in OpenRefine works fine without errors, but some photos have missing information, others don't.

Here is a link to the spreadsheet I use to gather the data: EF010 Import Effretikon Station Building - Google Sheets

Any help on why this is happening would be appreciated, because finding out what is missing and then adding it later on is a tiresome process!

Best regards
Moritz

Hi Moritz,

Thanks a lot for reporting this problem! I suspect this could be related to some timing issues in the publication process (with OpenRefine trying to add the structured data or wikitext before the page is actually published).

If this is indeed the root cause of the issue, then you are in luck: @Sebastian has been working on this as a side-effect of his work on supporting larger uploads (> 100MB). I wasn't aware of anyone running into this problem so far, but what you are describing sounds a lot like it.

The corresponding fix is already available in our snapshot releases.

In addition to that, the snapshot releases will also let you keep track of any errors which happen during the upload process (whereas you need to look at the server logs in previous versions). Perhaps that could shed light on the source of the problem, if it is not this timing issue.

Hi Antonin,

thanks for the quick reply. I tried the latest snapshot with another google sheets metadata spreadsheet, with the same fields and a comparable number of files, but encountered the same problem: See this category. Out of eleven files, five were uploaded without the structured data, and six did in fact get all the information I provided in google sheets and reconciled with OpenRefine. Within google sheets, most of the columns work with copy/paste for all items, so there should be no issue with formatting on my side. Especially not since everything works out fine with reconciliation.

Any other ideas?

That's curious. I assume the "Wikibase editing results" column doesn't contain any error for this new batch of files? In that case it could still be worth looking at the server logs (while uploading a new batch - sorry) and reporting their contents here, especially because @Sebastian has added some helpful logging statements to understand better what is going on under the hood.

OK, I got the error, and of course it was me: The files with the missing structured data were records where I had written a caption that was over the Wiki Commons length limit for captions (250 characters)!

I wonder if it is feasible to have either a check for the input length of certain fields, or to completely abort the upload if something fails, and not to proceed with the rest without a visible error output in OpenRefine. Or to have a feature that displays the number of characters in each cell? Because I wrote the content directly into my Google Sheets file, and then of course I would need to check the caption length seperately by myself cell by cell.

Great that you figured it out! It would definitely make sense for OpenRefine to check that captions aren't too long before the upload, so that you don't get bad surprises. I have opened an issue about that:

Thanks! Yes, definitely. Abort upload or at least have a prominent error message within OpenRefine.