Hi, I am currently trying to familiarize myself with Wikimedia Commons uploads using OpenRefine 3.7.3 on Windows 10. My first use case is to upload additional image versions to existing Commons File objects.
Recently, I uploaded ~200 images with Pattypan. Accidentally, instead of the original image size, I uploaded smaller image versions. Now I would like to enhance the existing file objects with the original-sized images.
Is this possible with OpenRefine? My sample OR project has two rows. I reconciled the file name column with Wikimedia Commons. The file path column contains the path to the larger image version on my local disk. I created a schema that contains only file name and file path:
I am able to log in to Wikimedia Commons with a bot password. When I click "Upload edits to Wikibase", it looks like something is uploaded. But when I look at the File object, nothing has changed. Here is the output of the OpenRefine log:
17:55:01.612 [ refine] POST /command/wikidata/perform-wikibase-edits (6ms)
17:55:01.614 [..mWikibaseEditsOperation] Performing edits (2ms)
17:55:01.614 [..ting.EditBatchProcessor] Requesting documents (0ms)
17:55:02.665 [ refine] GET /command/core/get-history (1051ms)
17:55:02.677 [ refine] GET /command/core/get-project-metadata (12ms)
17:55:02.680 [ refine] GET /command/core/get-models (3ms)
17:55:02.683 [ refine] POST /command/core/get-all-preferences (3ms)
17:55:02.686 [ refine] POST /command/core/get-rows (3ms)
17:55:02.686 [ refine] POST /command/core/compute-facets (0ms)
17:55:02.704 [ refine] GET /command/core/get-preference (18ms)
17:55:02.722 [ refine] GET /command/core/get-csrf-token (18ms)
17:55:02.724 [ refine] POST /command/wikidata/preview-wikibase-schema (2ms)
Am I doing something wrong? Or is it simply not possible to upload additional file versions with OpenRefine?
In the meantime, if Pattypan does not support uploading new version of files, perhaps the simplest option is to ask for deletion of that batch and reupload it?
It's indeed a very valid request and your question made me curious! While developing Wikimedia Commons features in OpenRefine last year, we indeed briefly discussed allowing overwriting files, but then decided to not implement it (yet). I can't exactly remember why; perhaps just because of insufficient time and/or because I was not sure if it would be an often-requested use case.
Antonin's suggestion is indeed one possible approach at this moment (deletion request, and re-uploading), but if your files are already used to illustrate Wikipedia articles or Wikidata items, it would remove them there automatically too, which is a pity.
I think some Wikimedians do batch overwrites with a bot. I am currently asking around.
If there are not too many files, you could also individually manually overwrite them on each individual file page (yes, tedious, and not doable if you have hundreds or thousands of files...).
Those are fair questions and observations. I will soon announce a new project to create better documentation and train more people to use the features.
Using JavaScript and bots is not available to everyone, and I think OpenRefine does fill a gap for people and organizations who don't have these skills. But OpenRefine is certainly not the only tool that can get the job done, and for many kinds of uploads other tools and processes may indeed be a lot more suitable.
they don’t have person as an object but say the person on this picture is the same person on picture with id xxx -> if we have in WD one Id we can also find more pictures that people in SPA has said is same as
Feels all those pieces should make it much more easy to match with Openrefine but I gave up and did a lot of cleaning manually afterwards…. And after that I have tested extract pictures with a Notebook und upload them in batch and do the cleaning/Matching in wiki commons spa2Commons/Notebook/Sveriges Riksdag i SPA.ipynb at main · salgo60/spa2Commons · GitHub in a nootebok I also get preview of the pictures…
the nice thing with massupload is that duplicates are directly identified
I think it is clear that many things in this integration can be improved and your feedback can very much help with that.
When you say "missed preview", what sort of preview are you thinking about? Is it that you'd need thumbnails for the files shown in OpenRefine before the upload? Or are you referring to a preview of the rendered wikitext? Or something else?
What sort of cleaning did you have to do after the upload?
Which sorts of duplicates? Files that you are uploading which are identical to existing files on Commons? Or naming conflicts? Or something else?
There are unfortunately no user-friendly tools with which you can do this in a more or less 'point and click' way, but it can be done via script or bot.
I have the impression that overwriting files is a rare-ish use case. I'm curious how often folks would want to do this, and whether it is also relevant for Wikibases in general. I suggest we continue discussing this at the GitHub issue that Antonin started.
To make an example, I'm uploading maps for italian municipalities. It could happen that there is a mistake for a province and I need to upload an updated version for all the 150+ municipalities of that province. It would be really nice to have such option
I think it wouldn't be too much work to add support for this in OpenRefine (but I haven't looked into it very closely). It's mostly a design question: is the feature going to be discoverable enough and how to minimize the rirsk that people upload new versions by mistake. Even if it's a rare use case, I think it's still worth providing this: OpenRefine should let you override everything that you upload, so you don't have to get it 100% right the first time.
Currently looking into this one and thinking a bit about when you may assume that the intention is to overwrite a file.
My current thinking is the same as that which @antonin_d expresed on Github. I.e "OpenRefine should only accept to upload a new version if the [filename] cell is reconciled to a matched item. Otherwise, it could be that the user believes they will upload a brand new file, and they end up overwriting an existing one."
But my question is, if you have reconciled the filename against a matched file, and you provide a file path can it be implied that you always want to overwrite the file? (if yes you wouldn't need an additional tickbox in the schema to say you want to overwrite existing files). I'm leaning towards yes on this one but it would be great to hear other opinions on it.
I would say yes, but it's probably worth having a reminder of this behaviour in the Issues tab (as a not severe warning, perhaps even at the lowest "Info" severity)