Compiling OpenRefine from source is not so easy: you need to have some familiarity with git, and install and configure Java, Maven and NPM on your machine.
I feel like the introduction of snapshot releases has had an interesting and positive impact on the project: because they are easier to install than compiling the tool from source, we have more people trying out new features before release. This is hard to quantify precisely of course, but I am hopeful that we see a fairly straightforward 3.7 release process thanks to that.
Still, this does not help users test pull requests before they are merged, since packaged versions will only be generated after merge. In the Wikimedia Commons integration project, I have the feeling this was a significant friction, and also introduced an unhelpful bias: I was sort of encouraged to merge PRs quickly, so that they could be tested by checking out the snapshot releases. That’s a problem: testing before merging would be much better.
Ideally I would like that anyone who opened an issue can easily check whether a PR that addresses it indeed solves their problem.
So I am wondering how to reduce this friction. I can think of the following options:
- Make it easier to run OpenRefine from source. This is obviously a win for everyone. Surely we can improve the documentation about that, but it is not clear to me to what extent we can reduce the number of steps to take: we are already considering dropping the “feature” consisting in downloading Maven on the fly in the
refine
/refine.bat
scripts, so that is going pretty much in the opposite direction. Potentially we could have some helpers scripts to help check out a pull request, perhaps (this is something Zulip does). - Use a similar GitHub Actions workflow to also publish built packages for pull requests. Those could then be advertised on the pull request, similarly to Netlify’s previews for the website and docs. The downside with that is that we would make PR builds heavier, and that this is generally quite a storage and bandwidth intensive thing to do. We are encouraging people to download hundreds of megabytes to review even a small change. But perhaps the lower entry barrier is worth it.
- Come up with a way not to spin up OpenRefine in some cloud provider for a given pull request. We do not officially support hosting OpenRefine but for testing purposes I think that would be okay. We could have a link from each PR (added by a bot or as a PR check) to automatically spin up an instance of OpenRefine from that PR. This would require cloud resources but could potentially be less wasteful in the sense that it would only be done when requested. This could be inspired by the OpenRefine deployment on PAWS, although I suspect it relies on having a sort-of fixed Docker image for it.
What do you think?