Improving the UX of extension install, and Butterfly

The problem of improving OpenRefine’s extensibility is becoming more pressing in my mind.

There are some things in OpenRefine for which I have a fairly clear mental picture of how things should be and I just lack the time to actually implement those ideas. For this problem I still have quite some open questions so let me write this down here, because I am a bit stuck.

What are the problems with OpenRefine’s current extension system?

  • The user experience of installing and using extensions is not great.
    • Downloading a zip file and unzipping it in a particular directory is not so easy. People are used to a full graphical experience, ideally with an online searchable catalogue listing extensions (such as extension stores for web browsers).
    • When upgrading OpenRefine, if some of the installed extensions are not compatible with the new version, OpenRefine will likely fail to start, without a clear error message.
  • The developer experience of creating and maintaining extensions is not great.
  • Extensions tend to be only compatible with fairly specific versions of OpenRefine, mostly because of a series of migrations we did between OpenRefine 2.8 and 3.3 (which were often forced on us). This is annoying both for users and developers. We have been a bit more stable recently but there can always be big looming changes we are not aware of yet.

Why those problems are becoming more pressing

  • The Wikibase extension has been growing in size and deserves to be in its own repository in my opinion. By moving an extension that is currently shipped with the main software to an external repository, we will ask people to install the extension manually, which is a new incentive to have a good user experience for that.
  • New extensions developed were developed by the OpenRefine team (such as the CommonsExtension), outside of the main repository, so our awareness of the pain points above has grown.
  • The new architecture introduced in the 4.0 branch is a massive compatibility break, so the detection of extension compatibility will be all the more important for this release. And because any changes to the extension mechanism of OpenRefine will most likely be breaking changes, perhaps this is a good opportunity to introduce them.

Possible solutions

Since our extension mechanism is determined by our use of Butterfly, the main question is what to do with Butterfly.

  • We could decide to migrate to something else that provides a better experience. This is a question that has been discussed many times (such as here). The problem is, I am not aware of any framework that would really be comparable. This is because applications like OpenRefine (browser-based but with a local server, and extensible) are rather uncommon, so it’s not a surprise there aren’t that many frameworks to support them.
  • In a sense, as the only known users of Butterfly, we are in a position to do pretty much what we want with this framework. We can change whatever aspect of it as we need. Whenever vulnerabilities are discovered (such as Log4Shell) we are able to patch it, and we control the release cycle so we can make it match ours. So we could decide to stick with it and just improve Butterfly so that it fits our bill. Which might involve rewriting large parts of it, though.

What do we actually need? From my perspective, the challenge is to find a system which lets extensions patch the backend and the frontend at the same time. We want users to be able to use our web UI to install an extension (probably packaged as some archive, but the user should not need to know that), and that extension should be able to provide new functionality both in the backend and in the frontend. This has a few implications:

  • Patching the backend is not too hard: this can be done by adding .class or .jar files provided by the extension to the Java classpath (or dynamically loading them via a classloader), and making sure the main application discovers the components exposed by the extension (for instance via the SPI and/or OSGi, or via explicit registration in a configuration file as is currently done).
  • Patching the frontend seems more difficult. The modern ways to combine JS modules together (using import statements and tools such as webpack) require external tooling that would need to be embedded in OpenRefine itself to perform those compilation steps at runtime, if we want to do better than just concatenating vanilla JS files together. Can we get away with something that does not require shipping the whole npm developer ecosystem to all our users? This blog post presents an example plug-in architecture for a single page app written with React, and there is another solution for Vue.js - it all looks pretty hacky so I wonder if there exist established versions of such things.

Where to learn about these things?

There are probably tons of relevant systems which try to solve similar problems and I am just not aware of them. Where do I find knowledgeable people to talk to? One thing I have been thinking about is to spend some time writing small extensions for major software platforms, and use this opportunity to learn about extension mechanisms used by successful software projects. I have been thinking about writing reconciliation-related extensions for mainstream spreadsheet software (LibreOffice, StarOffice, Google Sheets…) so that could be a good opportunity, but it is already fairly clear to me that the solutions adopted there will not be directly applicable to OpenRefine as long as we are using this weird “locally web-based” architecture.

1 Like

On Mastodon, Michael Lipp shared his JGrapes Web Console project which has tackled a very similar problem. He gives some details about how he solves the problem of injecting new assets on the front-end side, at runtime. On the back-end side, he uses OSGi too.

We should look at how NextCloud does things. They let users install “apps”, which can modify both the backend (PHP) and the frontend (vue.js). On the surface, this seems like something we could take inspiration from:
https://docs.nextcloud.com/server/latest/developer_manual/app_development/index.html

In the backend, apps are able to rely on PHP libraries but they do not have any isolation in place (a given library should not have different versions required by different apps running at the same time).
In the frontend, they use Webpack to create a Javascript file per app, using a Webpack configuration which (probably) avoids the inclusion of dependencies which are already supplied by the core.

NextCloud isn’t exactly a “locally web-based” tool like OpenRefine is but they let end users install and upgrade apps so it feels like this could be a fitting architecture for OpenRefine too.

1 Like

I keep thinking about this topic, as an important task that's on my back burner.

After looking at other projects (typically Nextcloud above, or this video about Pretix's plugin system more recently, I am more and more convinced that we should not look for an existing application framework that would fulfill our extensibility needs. Projects like Nextcloud or Pretix build their own plugin architecture and that's fine: the extensibility needs are different and it's helpful to be in control of that instead of being tied to an external dependency.

Perhaps one notable exception I am aware of is Gephi, which relies on the NetBeans platform (which comes with the plugin system). I'll try to reach out to the maintainers to check how happy they are with this, and whether they have suggestions for a web-based app like OpenRefine.

But generally I think we should just embrace Butterfly and revamp it to fit our needs. The immediate tasks I see are:

  1. migration to a recent version of Jetty
  2. migration to a declarative format for registering components provided by the extension (#5664)

The more long-running improvements (for which I don't know what it should look like yet) are:

  • isolation of CSS / JS code provided by the extensions (so that a JS error in an extension does not abort the entire app? We can probably not have full guarantees, but maybe there are ways to avoid the most catastrophic failures)
  • improvements about the way extensions can rely on additional libraries and avoid conflicts between those

Those two points seem to be problems that no-one claims to have fully solved as far as I am aware, so we also should not get roped up to much in them. As much as it makes sense to minimize migrations for extension developers, it's not blocking all improvements because we haven't yet figured out the perfect solution from the start.

I wonder what @tfmorris and @abbe98 think about that? What do you think about the phasing of such changes, would you be happy with first introducing 1. and 2. in some stable version, and having extension maintainers migrate to that first?

The Eclipse Foundation has some projects that utilize extensions so perhaps it would be wise to check in with them (no stone unturned).

But generally I think we should just embrace Butterfly and revamp it to fit our needs.

I'm all for this as I have ended up liking Butterfly on a conceptual level and I do not think it's a bad solution if given some love.

migration to a recent version of Jetty

Oh yes, I know of at least one issue on my end this would unblock.

migration to a declarative format for registering components provided by the extension

I'm all for it although I would suggest that a proposal for this declarative format is presented before one starts implementing it.

isolation of CSS / JS code provided by the extensions (so that a JS error in an extension does not >abort the entire app? We can probably not have full guarantees, but maybe there are ways to avoid the >most catastrophic failures)
improvements about the way extensions can rely on additional libraries and avoid conflicts between >those

Those two points seem to be problems that no-one claims to have fully solved as far as I am aware, so we also should not get roped up to much in them. As much as it makes sense to minimize migrations for extension developers, it's not blocking all improvements because we haven't yet figured out the perfect solution from the start.

I'm neither aware of any solutions, however, just documenting the practice of "namespacing" CSS (class prefixes) and JavaScript(window objects) could go a long way towards improving today's situation. It's on my todo-list for things to work on in core as I know it would resolve some issues there.

Can somebody guide me on how to contribute?

@anasadelopo great that you are interested in contributing! Please have a look here: Getting started | OpenRefine
and start a new thread if you need any help for your first contributions.

The thread here is about a rather technical subject which is not suitable for a first contribution.

Sorry I missed the ping on this thread. That approach sounds reasonable to me.

But generally I think we should just embrace Butterfly and revamp it to fit our needs.

I'm all for this as I have ended up liking Butterfly on a conceptual level and I do not think it's a bad solution if given some love.

Agree.

migration to a recent version of Jetty

Oh yes, I know of at least one issue on my end this would unblock.

I have branches with recent versions of Jetty. I'll check on their status.

migration to a declarative format for registering components provided by the extension

I'm all for it although I would suggest that a proposal for this declarative format is presented before one starts implementing it.

Agree with suggestion for design review before implementation (including extension developers), although I recognize that some prototyping using an existing extension may provide useful feedback.

I'm neither aware of any solutions, however, just documenting the practice of "namespacing" CSS (class prefixes) and JavaScript(window objects) could go a long way towards improving today's situation. It's on my todo-list for things to work on in core as I know it would resolve some issues there.

I think conventions such as namespaces are a perfectly good solution. Not everything has to involve code.

Tom

1 Like