Which kind of work, which features, which direction will OpenRefine prioritize, going forward? Which use cases and communities will the tool serve in the next 10 years?
I’ll be bold… I want to propose an OpenRefine strategy- and roadmap-building year in 2023 (code name: OpenRefine 2032?). As I am very interested in its outcomes, and I represent communities for whom OpenRefine is currently quite useful (but will it be in the future?), I want to volunteer some of my time (within limitations) to work on this as well.
I can imagine various approaches (and they can be combined):
A general in-depth user survey which is very widely advertised (perhaps even advertised inside OpenRefine itself, so that every user hears about it). More specific than the general two-yearly survey, and with the intention to collect many more responses. Really an in-depth exploration of how often users do certain things with the tool, and what kind of tasks they want to do with it going forward. (I have ideas of how to frame, build and distribute such a survey and would be happy to help with it.)
Roadmapping sprints per OpenRefine user community. Organize a large roadmapping sprint in each significant existing OpenRefine user community as currently already identified through OpenRefine’s two-yearly survey. I can imagine a three-pronged approach for each community separately:
In-person roadmapping meetings during / attached to major sector conferences in 2023 (eg IFLA conference for the international library community)
Online roadmapping and prioritization sessions for the same community around the same time (for people who can’t attend said conference / in person meetup)
And longer-running surveys or dedicated prioritization exercises, possibly here on this forum (can be different from the general survey - e.g. a prioritization survey of what the community has already come up with in the sessions mentioned above)
The intended outcome: a prioritized 10-year roadmap / wishlist for each community.
For such community-specific roadmapping exercises, OpenRefine can perhaps (via funded projects) provide a (paid) facilitator and a structure that can be re-used across communities, so that the process is uniform and participants just need to use existing materials to get going.
I think turnout for each community should also be taken into account in some way: if (despite similar outreach efforts) only three Wikimedians show up, but 80 data journalists and 130 digital humanities scholars too, then that’s something to keep in mind as well.
(I am willing to help with such community-specific exercises for the Wikimedia community.)
This is of course just a proposal and a first idea on how to approach this. I’m curious what others think; that’s why I’m posting this here.
Why am I proposing this? I work a lot with Wikimedians, and I teach OpenRefine in the general cultural sector. The strongest requests I hear from these communities are related to Linked Data use cases: data operations, but with the goal to export to / batch edit other databases; reconciliation and data enrichment. However, I have the impression that other communities would probably prefer OpenRefine to be a tool for cleaning and analyzing ‘big data’, which is a different use case. What are the most clear needs? And how do we expect these needs to evolve over time?
Once that is discussed in depth, a further conversation can start whether OpenRefine can accommodate all these needs, whether it will go a specific path, and who is willing and able to work on it. For “my” communities, this will be very helpful to know - is OpenRefine going to be a tool of choice for the next 10 years, will it provide sufficient support for common use cases and hence be worth the investment, or should we look in different directions?
Thank you, @Sandra , for opening this topic. It is an important conversation the community needs to have in 2023, as most of the OpenRefine significant milestones are currently being addressed:
Yes, this is indeed a topic that comes again and again, where we have not made much progress recently.
Personally I have always been held back on this by the fact that what happens in the project depends a lot on what people find the time and motivation to work on, and we have not got so much visibility on this.
I can at least say what I am motivated in working on personally in the next years (as of today - that can evolve of course):
Onboarding more people on the project, in all sorts of roles: I see that as an important way to keep the project moving;
The reproducibility project which just started. There are a lot of important usability issues connected to this topic, and those improvements are necessary for people to be able to rely on OpenRefine not just for one-off projects (importing this dataset in Wikidata, converting it to RDF…) but be able to rely on it durably for follow-up updates.
Improvements to reconciliation. The current workflows around it are not respectful of users’ time. We can offer a much more principled, reliable and efficient workflow, building on the improvements we have been doing in the W3C Community Group.
Packaging. It’s really not attractive work, but it goes a long way in making OpenRefine easier to adopt for newcomers;
A better extension architecture. At the moment extension developers have too few interface stability guarantees, and the user experience of installing and upgrading extensions is poor. There are a lot of open questions around this since OpenRefine’s architecture is quite uncommon, so there does not seem to be an established off-the-shelf model we can adopt.
General maintenance - there is never a shortage of things to do there, and it can also be satisfying work, thankfully.
Clustering. This is such a popular feature of the tool and still there is so much we could improve there! I guess I have not touched it yet because it feels daunting to change anything in such a successful feature, but if we carefully test things I don’t see why we could not improve things.
There’s probably quite a few topics missing, but that’s probably a good start.
I attended this talk by Oleg Nenashev about roadmaps for FOSS projects and found it quite inspiring. The recording is available.
The main takeaway for me is that although it’s hard to predict what will happen even on a relatively short term in a FOSS project (such as a one year horizon), it’s still worth having a document indicating what the current contributors are hoping to get done in the not so distant future. It at least conveys the current direction, the interests of the current project team. This should be useful for new contributors to get a better sense of where the project is heading: they are free to contribute to those stated goals or work on other goals not listed in the roadmap. It should also be useful for partners to understand what we are currently working on.
One major problem that other FOSS projects seem to encounter is how to keep such a roadmap up to date. So I would be interested to think about some regular process to update it.
We currently run our user survey every two years. Why not re-use this schedule? After the results of the user survey are published, we could do some sort of contributor survey, asking people what they see as priorities for the project and what they hope to work on themselves. Doing this after the user survey would make it easier to take into account the needs expressed there.
I am not sure how to survey contributors to make this roadmap. Intuitively, using a form like the user survey is not so fitting, because we are not looking to aggregate a large number of response into statistics, but rather let people contribute more elaborate descriptions of what they are working on. Those descriptions should then be consolidated into a nice-looking document. Perhaps it can be as simple as a forum thread asking people to describe their priorities. And then someone works on a blog post / website page to aggregate that.
I’m very fond of Gitea’s approach to this. After each release they start a thread where contributors state what they intend to work on towards the next release. It makes it easy to spot what each and everyone want’s to work on and where one can help.
Short-term roadmap
I like the gitea approach. Note that they also pinned the issue to discuss the current release at the top of the issue so it remains visible. See here Issues · go-gitea/gitea · GitHub
Long-term roadmap
We also need a longer-term roadmap listing what we want to work on. Something simple like a Github wiki page or GitHub project. It can be reviewed yearly (or twice yearly) by the community (via an issue or thread on the forum). The goal is to bring visibility to the different initiatives (sometimes happening outside the openrefine/openrefine repo). I don’t think the roadmap should support conversation (like a forum thread). Instead, it should link to the discussion of each item (a forum thread, a GitHub issue …) so conversations remain in context.
I like using GitHub projects, but then that might limit who can participate in maintaining the roadmap, but maybe it already can allow any contributor? We’ll need to check. But I guess the roadmap should actually be maintained by contributors and not just users. And certainly then the discussion can flow and live in our forum for users and everyone to discuss.
The GitHub wiki pages, allow for anyone to edit, but then might open a can of worms and it’s not the best tool for organizing sets of issues and mapping their progress over time.
I felt the need to write up what my own priorities were so I did this as a blog post. As stated there it's not meant to be OpenRefine's roadmap, but perhaps it can encourage others to do something similar, painting where they want to see the project go in the future? And with a bit of luck there is some common ground to be found!