Wikitext in OpenRefine

When teaching how to upload images to Wikimedia Commons using OR, I realize that some people are already very used to generating Wikitext the way it is done in PattyPan, and find it difficult to do this alone in a spreadsheet or in OR itself.

I know it's possible to insert SDC and reduce Wikitext usage, but that would be one more thing for that person to learn. In other words, she must learn SDC, OR and set up her own Wikitext alone.

I'm thinking about creating a tool on Toolforge, where this person could create their Wikitext in the same way they would on PattyPan, as I believe it would be interesting to use what people already know. Then, she could use the spreadsheet generated in OR. This way, this person would only need to learn how to upload images using OR. Over time, it can learn SDC and also upload this data using OR.

Before creating this tool, I would like to know if there is already something being developed in OR or if you intend to implement something similar.

Hi @Lucas.Belo

Although I've used Wikimedia Commons, and understand the basic workflow to using OR to add files to WC, I'm not familiar enough with other tools or concepts to understand everything here. I wonder if you could expand a bit on the use of Wikitext, how the generation is done in PattyPan, what SDC stands for?

Perhaps an example would help me understand the challenges and how your proposed solution would help, and whether there is any existing function in OpenRefine that could help?



Hi @ostephens. Sorry, I realized my post was lacking context, I'll improve it.

To upload images to Wikimedia Commons through Open Refine you must provide: path, descriptive file name and wikitext:

Structured data for Commons (SDC) is optional.

For many users, creating wikitext alone can be a challenge, and as far as I know, Open Refine does not support this. On the other hand, through OR it is also possible to send SDC which can reduce the use of wikitext.

I realize that the Wikimedia Commons community is already very familiar with PattyPan, as this software has resources for assembling Wikitext. I believe that offering resources to assemble Wikitext (using what the community already knows) will be easier than teaching SDC, which for many is something new - and even using SDC it will still be necessary to assemble some Wikitext.

When opening this discussion I would like to know if there is anything being developed so that Open Refine offers support for the creation of Wikitext. If they haven't started yet or this isn't on the horizon, I thought I'd create some tool in Python to provide this. However, I later realized that it is possible to create extensions in Open Refine and that this would be a better way. But maybe this would be a more difficult path for me and I would probably need help.

Here a little about PattyPan and its interface for creating wikitext

Hi @Lucas.Belo,

Thanks a lot for your interest in helping out in this area!
As part of a follow-up grant from the WMF we have a small budget set aside for making some improvements to the SDC integration. We haven't decided yet which improvements this will cover exactly.

A more user-friendly wikitext generation system would be nice, but I think that would likely be too big to develop within that grant.

Given that the Commons community is gradually simplifying the structure of the wikitext so that information is pulled from SDC instead (with the adoption of {{License from structured data}} for instance), my hope would be that at some point, a constant wikitext (not depending on the file at all) could be used, in which case we'd be able to simply pre-fill it in the predefined schemas that are shipped with the tool. Something like:

== {{int:filedesc}} ==

== {{int:license-header}} ==
{{License from structured data}}

That would mean that users wouldn't have to interact with this field at all and would simply be able to drag and drop columns into the structured data fields. But I am not sure if the Commons community wants to go down that route.

In the meantime, I think it would be really great to have helper tools or OpenRefine extensions which ease this workflow.

1 Like

Hi Lucas,

Thanks for your interest in working on this.

I'm totally with Antonin on this topic though - not just for OpenRefine, but for Wikimedia Commons in general: I think we should strive for "structured data first" uploads with as little Wikitext as possible. Flickypedia is also going that route, and I have a general impression that the Commons community does not oppose this at all.

In my experience working with cultural institutions, SDC makes much more sense to them (certainly if they have not used Wikitext before, but also if they have been used to Wikitext but they see the benefits of SDC - easier retrieval of improved metadata, multilinguality, avoiding data duplication/data drift).

At this moment, I'm bringing together a small (partly volunteer, partly staff) working group on the Wikimedia Commons side to develop more Lua-driven Wikitext templates, and I already have a small group of 4-5 people together who are quite interested in working on this. Would you perhaps consider joining this group too, instead? We can certainly use more Lua capacity on Wikimedia projects.

I'll also put this on the agenda of our chat next week (in the context of the train the trainer course).


I'm in line with other comments here about better exploit the advantages of SDC:

  • it has more future;
  • it provides, IMO, lot better metadata management and resources discoverability;
  • it's 100% aligned with the OR paradigm
  • so it's faster (just to me?) to set the project media metadata
  • and mangling with wikitext is a pain in the *ss :expressionless:

I'm my experience, when you are doing the effort of learning a new tool like OR, you deserve to know how exploit it to the best of your benefit. The extra effort of dropping the old intensive wikitext based paradigm would be really rewarded.

P.S.: I'm so SDC believer these days I'm convinced this feature should be an extension bundled in standard Wikibase by default.