Scheduling breaking changes we have on our radar

  • major upgrades to libraries which are part of the extension interface (Jetty, Jackson, maybe others?) - such as PR #6077

The Butterfly upgrade includes:

  • Jetty 10 -> 12 (We only recently upgraded from Jetty 9 to Jetty 10 and haven't released that yet)
  • Java Servlet 4 -> Servlet 6 (most common uses are HttpServletRequest, HttpServletResponse, ServletException which go from javax.* to jakarta.* but rather than rote renaming, we should see how many references we can eliminate the need for)
  • Velocity 1.x -> 2.3
  • Java 8 -> Java 9 (we already require Java 11)
  • Apache Commons File Upload 1.5 -> 2.0
  • Apache commons-lang - removed (undeclared transitive dependency that OpenRefine was depending on)
  • removing Jackson from the extension interface by introducing other mechanisms for JSON serialization, as advocated for by @tfmorris (given Jackson's stability it's not clear to me that it's worth the breakage)

I think as a matter of good API hygiene we should try to exclude concrete classes and third party dependencies. The fewer requirements for shared common components we place on the extensions, the better, I think, but obviously some are necessary. For example, having them share a common logging infrastructure is a no-brainer.

  • replacing the registration of components contributed by extensions from controller.js to a declarative format (perhaps it can be done without breaking, if supporting both extension formats for a while is doable?)

This definitely seems like something that could be phased in with a deprecation period for the old style, if it makes sense.

Over and above restoring the old behavior, we should consider what rules we want to put in place for extensions and how (or if?) we want to enforce them. Undeclared usage of transitive dependencies has historically been a significant source of breakage in extensions, so anything we can do to improve that situation would be helpful. Conversely, if we can go the direction of only doing major version updates to dependencies when we do major OpenRefine releases, but that seems pretty restrictive.

  • change of namespace from com.google.refine to org.openrefine

Is it implied that this includes the Maven re-modularization? If not, that's another set of package name changes to be added to the list.

As long as we're shuffling things around, it would be worth considering what, if anything, we want to hide in internal implementation packages that aren't accessible to (or at least documented for) extensions.

  • immutable project data storage with lazily computed operations between history steps

Does this imply/bundle changes to the evaluation infrastructure which are commonly used by extensions? (Row/RecordVisitor, Operation, etc)

  • changes to the way frontend assets are bundled together: for instance, changes in how extensions are expected to initialize their contribution to the frontend (and un-initialize them too, perhaps?)
  • changes to operation registration, linking the backend and frontent components together (can probably be done in a backwards-compatible way without too much effort though)

Anything else? Anything you disagree with (such as things we shouldn't do at all?)

I think there was a grid cell renderer extension point introduced recently. Is that covered by the two items above?
The "Extensions" menu is a pretty limited (but safe) extension point. Are there additional front end extension points that could/should be contemplated?

Other items to consider:

  • Java 17 - historically we've been very conservative with bumping Java versions. Currently we require Java 11, but Jena 5 will require Java 17 and the Java ecosystem, in general, has been upgrading more quickly recently. Also, as an application rather than a library, there are fewer reasons for OpenRefine to be conservative in its usage of new Java features.

  • Jena 5 - I don't think we have a functional need for this, but the Jena project doesn't seem to do security updates for previous versions. Upgrading this has historically caused problems for the rdf-extension (perhaps a good opportunity to figure out long-term solutions to incompatible versions of common dependencies?)

  • Extension preferences - currently extensions are required to use Jackson and to embed a hardcoded class name in any custom preference data types. We should figure out a better scheme for this

  • REST API protocol review/update, versioning and documenting as a public API (could be broken into two or items)

  • Incompatible evaluation results for GREL and/or GREL functions - are we committed to bug for bug compatibility? When / how are we allowed to change results of evaluations?

  • Crufty GREL function signatures - we have a number of functions which were extended over time in a backward compatible way, but the strictures of backward compatibility have given them funky definitions which could be cleaned up (e.g. Locale addition to various format/parse operations)

  • Operation history versioning & standardization - we've actively discouraged people from using this except in very limited contexts, but to what extent have they come to rely on it anyway and are going to be burned by any changes? What's their upgrade path? (Presumably some of this already changes with the project serialization format?)
    There's probably more, but that seems like plenty to start with :grin:

Tom