Upload new image version to existing file on Wikimedia Commons

Hi, I am currently trying to familiarize myself with Wikimedia Commons uploads using OpenRefine 3.7.3 on Windows 10. My first use case is to upload additional image versions to existing Commons File objects.

Recently, I uploaded ~200 images with Pattypan. Accidentally, instead of the original image size, I uploaded smaller image versions. Now I would like to enhance the existing file objects with the original-sized images.

Is this possible with OpenRefine? My sample OR project has two rows. I reconciled the file name column with Wikimedia Commons. The file path column contains the path to the larger image version on my local disk. I created a schema that contains only file name and file path:

I am able to log in to Wikimedia Commons with a bot password. When I click "Upload edits to Wikibase", it looks like something is uploaded. But when I look at the File object, nothing has changed. Here is the output of the OpenRefine log:

17:55:01.612 [ refine] POST /command/wikidata/perform-wikibase-edits (6ms)
17:55:01.614 [..mWikibaseEditsOperation] Performing edits (2ms)
17:55:01.614 [..ting.EditBatchProcessor] Requesting documents (0ms)
17:55:02.665 [ refine] GET /command/core/get-history (1051ms)
17:55:02.677 [ refine] GET /command/core/get-project-metadata (12ms)
17:55:02.680 [ refine] GET /command/core/get-models (3ms)
17:55:02.683 [ refine] POST /command/core/get-all-preferences (3ms)
17:55:02.686 [ refine] POST /command/core/get-rows (3ms)
17:55:02.686 [ refine] POST /command/core/compute-facets (0ms)
17:55:02.704 [ refine] GET /command/core/get-preference (18ms)
17:55:02.722 [ refine] GET /command/core/get-csrf-token (18ms)
17:55:02.724 [ refine] POST /command/wikidata/preview-wikibase-schema (2ms)

Am I doing something wrong? Or is it simply not possible to upload additional file versions with OpenRefine?

1 Like

Hi @Annabelle_Wiegart,

I am afraid this is not a use case we support yet. But it would intuitively make sense to me to add support for that. I have opened a ticket about this: Support uploading new version of media files · Issue #5959 · OpenRefine/OpenRefine · GitHub

In the meantime, if Pattypan does not support uploading new version of files, perhaps the simplest option is to ask for deletion of that batch and reupload it?

2 Likes

Hi @antonin_d ,

Thank you very much! Adding this feature would be very helpful.

For the moment, I will try your suggested option.

1 Like

Hi Annabelle! :wave:

It's indeed a very valid request and your question made me curious! While developing Wikimedia Commons features in OpenRefine last year, we indeed briefly discussed allowing overwriting files, but then decided to not implement it (yet). I can't exactly remember why; perhaps just because of insufficient time and/or because I was not sure if it would be an often-requested use case.

Antonin's suggestion is indeed one possible approach at this moment (deletion request, and re-uploading), but if your files are already used to illustrate Wikipedia articles or Wikidata items, it would remove them there automatically too, which is a pity.

I think some Wikimedians do batch overwrites with a bot. I am currently asking around.

If there are not too many files, you could also individually manually overwrite them on each individual file page (yes, tedious, and not doable if you have hundreds or thousands of files...).

How is the usage of the integration with Wikicommons?

I tried last year match +100 pictures of Swedish PM and upload and it was not straight forward

  1. missed preview
  2. had to do a lot of cleaning afterwards feels easier just upload and match afterwards

It was pictures like in this folder Category:Porträttbok: Riksdagsmän 1906 - Wikimedia Commons

I have written a JavaScript integration like the one used for iNaturalist ie it takes one picture at the time but that is an easier process….
See GitHub - salgo60/spa2Commons: find pictures in SPA and upload on Wikicommons based on https://github.com/kaldari/iNaturalist2Commons

Hi Magnus,

Those are fair questions and observations. I will soon announce a new project to create better documentation and train more people to use the features.

Very roughly, until now, the Commons features in OpenRefine have been used to upload 60,000+ files to Commons (most by GLAMs and Wikimedia affiliates). There have been around 250,000 OpenRefine-powered edits on Commons with OpenRefine 3.7.x so far.

Using JavaScript and bots is not available to everyone, and I think OpenRefine does fill a gap for people and organizations who don't have these skills. But OpenRefine is certainly not the only tool that can get the job done, and for many kinds of uploads other tools and processes may indeed be a lot more suitable.

2 Likes

Sure but my user case I feel should be a perfect match for Open Refine

  • we have 3500 Swedish PM people in Wikidata - WD is the most complete digital source
  • we have a person who has scanned > 900 000 pictures of Swedish people in SPA - Svenskt Porträttarkiv
    • many of those pictures has a source telling its a Swedish PM person
  • it has structured fields for names
  • it has the text OCRed
  • they have an API I use for my wikicommon JavaScript gadget SpA2Commons
  • Wikidata has a property used > 14000 Swedish Portrait Archive ID - Wikidata for SPA
  • they don’t have person as an object but say the person on this picture is the same person on picture with id xxx -> if we have in WD one Id we can also find more pictures that people in SPA has said is same as

Feels all those pieces should make it much more easy to match with Openrefine but I gave up and did a lot of cleaning manually afterwards…. And after that I have tested extract pictures with a Notebook und upload them in batch and do the cleaning/Matching in wiki commons spa2Commons/Notebook/Sveriges Riksdag i SPA.ipynb at main · salgo60/spa2Commons · GitHub in a nootebok I also get preview of the pictures…

  • the nice thing with massupload is that duplicates are directly identified

I think it is clear that many things in this integration can be improved and your feedback can very much help with that.

When you say "missed preview", what sort of preview are you thinking about? Is it that you'd need thumbnails for the files shown in OpenRefine before the upload? Or are you referring to a preview of the rendered wikitext? Or something else?

What sort of cleaning did you have to do after the upload?

Which sorts of duplicates? Files that you are uploading which are identical to existing files on Commons? Or naming conflicts? Or something else?

1 Like

For future people who are interested in doing this (overwriting a larger batch of Commons files):

Here's a short thread on the Wikimedia Commons Help desk on the topic.

  • General Wikimedia Commons guidelines about overwriting existing files: Commons:Overwriting existing files - Wikimedia Commons
  • There are unfortunately no user-friendly tools with which you can do this in a more or less 'point and click' way, but it can be done via script or bot.

I have the impression that overwriting files is a rare-ish use case. I'm curious how often folks would want to do this, and whether it is also relevant for Wikibases in general. I suggest we continue discussing this at the GitHub issue that Antonin started.

Hi!

To make an example, I'm uploading maps for italian municipalities. It could happen that there is a mistake for a province and I need to upload an updated version for all the 150+ municipalities of that province. It would be really nice to have such option

1 Like

I think it wouldn't be too much work to add support for this in OpenRefine (but I haven't looked into it very closely). It's mostly a design question: is the feature going to be discoverable enough and how to minimize the rirsk that people upload new versions by mistake. Even if it's a rare use case, I think it's still worth providing this: OpenRefine should let you override everything that you upload, so you don't have to get it 100% right the first time.

1 Like

Currently looking into this one and thinking a bit about when you may assume that the intention is to overwrite a file.

My current thinking is the same as that which @antonin_d expresed on Github. I.e "OpenRefine should only accept to upload a new version if the [filename] cell is reconciled to a matched item. Otherwise, it could be that the user believes they will upload a brand new file, and they end up overwriting an existing one."

But my question is, if you have reconciled the filename against a matched file, and you provide a file path can it be implied that you always want to overwrite the file? (if yes you wouldn't need an additional tickbox in the schema to say you want to overwrite existing files). I'm leaning towards yes on this one but it would be great to hear other opinions on it.

1 Like

I would say yes, but it's probably worth having a reminder of this behaviour in the Issues tab (as a not severe warning, perhaps even at the lowest "Info" severity)

1 Like