Uptake of Wikimedia Commons features in OpenRefine so far

How have Wikimedia Commons features in OpenRefine been used until now? I have been gathering some numbers around this, for two purposes:

For those who enjoy a deep dive, this spreadsheet gathers all the numbers in detail, and has links to the places where to find the source data. The spreadsheet will be updated over time with new measurements (some of the data is not available historically and so new measuring moments need to be added manually). I also added a tab with data that I would be interested in gathering, but I am not (yet) sure how. If you have additions on other data to gather, questions to ask, and sources where I can find data: please add them as suggestions/comments in the spreadsheet.

Profile of users and uploads

Commons features in OpenRefine have become available for users around July 2022. One year later, on July 3, 2023, 63,657 files had been uploaded to Wikimedia Commons with OpenRefine. 3,4% of these were uploaded by users which can be classified as 'individual volunteer Wikimedians'. 40,7% were uploaded by staff of Wikimedia affiliates (e.g. Wikimedia chapters), and 56% by staff of cultural institutions (GLAMs).
Uploaded files (2023-07-03)
This is in contrast with an analysis of users in general (versus number of files uploaded):
Number of users per category (2023-07-03)
We see that 45.2% of users who have tried the features are individual (non-affiliated) Wikimedia volunteers. Usually, they have only 'dipped their toes' with one or just a few uploaded files (hence the contrast with number of uploaded files).

This means that the vast majority of Commons uploads through OpenRefine are done either by, or on behalf of, cultural institutions: Wikimedia affiliate staff usually do uploads for smaller cultural institutions that don't have the time/capacity/skills to do such uploads themselves. Here (a Wikimedia Commons category) you can page through the files that have been uploaded, to get an idea of what kind of material has been uploaded so far.

In an upcoming user survey I want to get a more detailed view on the profile of users of the Commons features, and the functionalities and aspects of the workflow they are confident or less confident with.

Uploaders and uploads over time

Besides this breakdown per user group, I'm also tracking how the number of uploads evolves over time. You can see that there is steady growth, with the number of files per uploader slowly increasing (which probably means that less uploaders try the features, and more of them do larger, 'serious' uploads). It will be interesting to see how this trend evolves over time.

Number of uploaded files over time
Number of uploaders over time
Average number of files per uploader

Usage / uptake of uploads on Wikimedia projects

It is also possible to get an idea about 'uptake' and re-use of uploaded files on Wikimedia Commons. How often are these files used in other Wikimedia projects, most notably as illustrations of Wikipedia articles, but also as illustrations / main images of Wikidata items?
I have compared usage of uploads via OpenRefine with uploads via the GLAMwiki Toolset, a now defunct batch uploading tool which served 'advanced' use cases (large uploads via XML, often 10,000s of files at the same time or more).

Screenshot 2023-09-12 at 12.03.49

On September 3, 34.55% of all files uploaded with OpenRefine have been used on Wikimedia projects, compared to only 9.10% for the GLAMwiki Toolset. 34.55% is not bad at all - this number is often below 10%. In the case of the OpenRefine uploads, it looks like many uploaded images have been immediately connected with Wikidata items, which explains this high percentage (and makes sense, as OpenRefine makes it possible to combine these workflows).

As mentioned above, I welcome tips and advice on gathering more statistics: both things to measure, and how to measure them :slight_smile: