OpenRefine 2032 ... what direction does OpenRefine want to go?

Which kind of work, which features, which direction will OpenRefine prioritize, going forward? Which use cases and communities will the tool serve in the next 10 years?

I’ll be bold… I want to propose an OpenRefine strategy- and roadmap-building year in 2023 (code name: OpenRefine 2032?). As I am very interested in its outcomes, and I represent communities for whom OpenRefine is currently quite useful (but will it be in the future?), I want to volunteer some of my time (within limitations) to work on this as well.

I can imagine various approaches (and they can be combined):

  • A general in-depth user survey which is very widely advertised (perhaps even advertised inside OpenRefine itself, so that every user hears about it). More specific than the general two-yearly survey, and with the intention to collect many more responses. Really an in-depth exploration of how often users do certain things with the tool, and what kind of tasks they want to do with it going forward. (I have ideas of how to frame, build and distribute such a survey and would be happy to help with it.)

  • Roadmapping sprints per OpenRefine user community. Organize a large roadmapping sprint in each significant existing OpenRefine user community as currently already identified through OpenRefine’s two-yearly survey. I can imagine a three-pronged approach for each community separately:

    1. In-person roadmapping meetings during / attached to major sector conferences in 2023 (eg IFLA conference for the international library community)
    2. Online roadmapping and prioritization sessions for the same community around the same time (for people who can’t attend said conference / in person meetup)
    3. And longer-running surveys or dedicated prioritization exercises, possibly here on this forum (can be different from the general survey - e.g. a prioritization survey of what the community has already come up with in the sessions mentioned above)
    • The intended outcome: a prioritized 10-year roadmap / wishlist for each community.
    • For such community-specific roadmapping exercises, OpenRefine can perhaps (via funded projects) provide a (paid) facilitator and a structure that can be re-used across communities, so that the process is uniform and participants just need to use existing materials to get going.
    • I think turnout for each community should also be taken into account in some way: if (despite similar outreach efforts) only three Wikimedians show up, but 80 data journalists and 130 digital humanities scholars too, then that’s something to keep in mind as well.
      (I am willing to help with such community-specific exercises for the Wikimedia community.)

This is of course just a proposal and a first idea on how to approach this. I’m curious what others think; that’s why I’m posting this here.

Why am I proposing this? I work a lot with Wikimedians, and I teach OpenRefine in the general cultural sector. The strongest requests I hear from these communities are related to Linked Data use cases: data operations, but with the goal to export to / batch edit other databases; reconciliation and data enrichment. However, I have the impression that other communities would probably prefer OpenRefine to be a tool for cleaning and analyzing ‘big data’, which is a different use case. What are the most clear needs? And how do we expect these needs to evolve over time?

Once that is discussed in depth, a further conversation can start whether OpenRefine can accommodate all these needs, whether it will go a specific path, and who is willing and able to work on it. For “my” communities, this will be very helpful to know - is OpenRefine going to be a tool of choice for the next 10 years, will it provide sufficient support for common use cases and hence be worth the investment, or should we look in different directions?

Thank you, @Sandra , for opening this topic. It is an important conversation the community needs to have in 2023, as most of the OpenRefine significant milestones are currently being addressed:

— so what’s next?

1 Like

Yes, this is indeed a topic that comes again and again, where we have not made much progress recently.
Personally I have always been held back on this by the fact that what happens in the project depends a lot on what people find the time and motivation to work on, and we have not got so much visibility on this.

I can at least say what I am motivated in working on personally in the next years (as of today - that can evolve of course):

  • Onboarding more people on the project, in all sorts of roles: I see that as an important way to keep the project moving;
  • The reproducibility project which just started. There are a lot of important usability issues connected to this topic, and those improvements are necessary for people to be able to rely on OpenRefine not just for one-off projects (importing this dataset in Wikidata, converting it to RDF…) but be able to rely on it durably for follow-up updates.
  • Improvements to reconciliation. The current workflows around it are not respectful of users’ time. We can offer a much more principled, reliable and efficient workflow, building on the improvements we have been doing in the W3C Community Group.
  • Packaging. It’s really not attractive work, but it goes a long way in making OpenRefine easier to adopt for newcomers;
  • A better extension architecture. At the moment extension developers have too few interface stability guarantees, and the user experience of installing and upgrading extensions is poor. There are a lot of open questions around this since OpenRefine’s architecture is quite uncommon, so there does not seem to be an established off-the-shelf model we can adopt.
  • General maintenance - there is never a shortage of things to do there, and it can also be satisfying work, thankfully.
  • Clustering. This is such a popular feature of the tool and still there is so much we could improve there! I guess I have not touched it yet because it feels daunting to change anything in such a successful feature, but if we carefully test things I don’t see why we could not improve things.

There’s probably quite a few topics missing, but that’s probably a good start.

2 Likes