Hi, I am currently trying to familiarize myself with Wikimedia Commons uploads using OpenRefine 3.7.3 on Windows 10. My first use case is to upload additional image versions to existing Commons File objects.
Recently, I uploaded ~200 images with Pattypan. Accidentally, instead of the original image size, I uploaded smaller image versions. Now I would like to enhance the existing file objects with the original-sized images.
Is this possible with OpenRefine? My sample OR project has two rows. I reconciled the file name column with Wikimedia Commons. The file path column contains the path to the larger image version on my local disk. I created a schema that contains only file name and file path:
I am able to log in to Wikimedia Commons with a bot password. When I click "Upload edits to Wikibase", it looks like something is uploaded. But when I look at the File object, nothing has changed. Here is the output of the OpenRefine log:
17:55:01.612 [ refine] POST /command/wikidata/perform-wikibase-edits (6ms)
17:55:01.614 [..mWikibaseEditsOperation] Performing edits (2ms)
17:55:01.614 [..ting.EditBatchProcessor] Requesting documents (0ms)
17:55:02.665 [ refine] GET /command/core/get-history (1051ms)
17:55:02.677 [ refine] GET /command/core/get-project-metadata (12ms)
17:55:02.680 [ refine] GET /command/core/get-models (3ms)
17:55:02.683 [ refine] POST /command/core/get-all-preferences (3ms)
17:55:02.686 [ refine] POST /command/core/get-rows (3ms)
17:55:02.686 [ refine] POST /command/core/compute-facets (0ms)
17:55:02.704 [ refine] GET /command/core/get-preference (18ms)
17:55:02.722 [ refine] GET /command/core/get-csrf-token (18ms)
17:55:02.724 [ refine] POST /command/wikidata/preview-wikibase-schema (2ms)
Am I doing something wrong? Or is it simply not possible to upload additional file versions with OpenRefine?
I am afraid this is not a use case we support yet. But it would intuitively make sense to me to add support for that. I have opened a ticket about this: Support uploading new version of media files · Issue #5959 · OpenRefine/OpenRefine · GitHub
In the meantime, if Pattypan does not support uploading new version of files, perhaps the simplest option is to ask for deletion of that batch and reupload it?
Hi @antonin_d ,
Thank you very much! Adding this feature would be very helpful.
For the moment, I will try your suggested option.
It's indeed a very valid request and your question made me curious! While developing Wikimedia Commons features in OpenRefine last year, we indeed briefly discussed allowing overwriting files, but then decided to not implement it (yet). I can't exactly remember why; perhaps just because of insufficient time and/or because I was not sure if it would be an often-requested use case.
Antonin's suggestion is indeed one possible approach at this moment (deletion request, and re-uploading), but if your files are already used to illustrate Wikipedia articles or Wikidata items, it would remove them there automatically too, which is a pity.
I think some Wikimedians do batch overwrites with a bot. I am currently asking around.
If there are not too many files, you could also individually manually overwrite them on each individual file page (yes, tedious, and not doable if you have hundreds or thousands of files...).
How is the usage of the integration with Wikicommons?
I tried last year match +100 pictures of Swedish PM and upload and it was not straight forward
- missed preview
- had to do a lot of cleaning afterwards feels easier just upload and match afterwards
It was pictures like in this folder Category:Porträttbok: Riksdagsmän 1906 - Wikimedia Commons
See GitHub - salgo60/spa2Commons: find pictures in SPA and upload on Wikicommons based on https://github.com/kaldari/iNaturalist2Commons
Those are fair questions and observations. I will soon announce a new project to create better documentation and train more people to use the features.
Very roughly, until now, the Commons features in OpenRefine have been used to upload 60,000+ files to Commons (most by GLAMs and Wikimedia affiliates). There have been around 250,000 OpenRefine-powered edits on Commons with OpenRefine 3.7.x so far.
Sure but my user case I feel should be a perfect match for Open Refine
- we have 3500 Swedish PM people in Wikidata - WD is the most complete digital source
- we have a person who has scanned > 900 000 pictures of Swedish people in SPA - Svenskt Porträttarkiv
- many of those pictures has a source telling its a Swedish PM person
- it has structured fields for names
- it has the text OCRed
- Wikidata has a property used > 14000 Swedish Portrait Archive ID - Wikidata for SPA
- they don’t have person as an object but say the person on this picture is the same person on picture with id xxx -> if we have in WD one Id we can also find more pictures that people in SPA has said is same as
Feels all those pieces should make it much more easy to match with Openrefine but I gave up and did a lot of cleaning manually afterwards…. And after that I have tested extract pictures with a Notebook und upload them in batch and do the cleaning/Matching in wiki commons spa2Commons/Notebook/Sveriges Riksdag i SPA.ipynb at main · salgo60/spa2Commons · GitHub in a nootebok I also get preview of the pictures…
- the nice thing with massupload is that duplicates are directly identified
I think it is clear that many things in this integration can be improved and your feedback can very much help with that.
When you say "missed preview", what sort of preview are you thinking about? Is it that you'd need thumbnails for the files shown in OpenRefine before the upload? Or are you referring to a preview of the rendered wikitext? Or something else?
What sort of cleaning did you have to do after the upload?
Which sorts of duplicates? Files that you are uploading which are identical to existing files on Commons? Or naming conflicts? Or something else?
For future people who are interested in doing this (overwriting a larger batch of Commons files):
Here's a short thread on the Wikimedia Commons Help desk on the topic.
- General Wikimedia Commons guidelines about overwriting existing files: Commons:Overwriting existing files - Wikimedia Commons
- There are unfortunately no user-friendly tools with which you can do this in a more or less 'point and click' way, but it can be done via script or bot.
I have the impression that overwriting files is a rare-ish use case. I'm curious how often folks would want to do this, and whether it is also relevant for Wikibases in general. I suggest we continue discussing this at the GitHub issue that Antonin started.