Status of OpenRefine in Debian

Hi all

it's been a while since I posted an update, so without further ado, here it is. tl;dr

OpenRefine is up-to-date in Debian testing and unstable (currently version 3.7.7) and is also available in Debian stable and bullseye-backports with version 3.6.2 and security updates.

We have had a long Debian development freeze in the first half of 2023 and openrefine was affected by a bug in Rhino, a Javascript engine, which required some effort to resolve successfully. After I upgraded OpenRefine to 3.6.1 I discovered that the Javascript frontend stopped working entirely. There were no obvious error messages and the Firefox browser console for web development didn't show anything meaningful either. After comparing the artifacts of the official release with the Debian one, I saw that librhino-java in Debian was outdated and required an update. I upgraded said package to version 1.7.14 which in turn forced me to rebuild and test all reverse-dependencies of the library. Most notably the new version caused regressions in shrinksafe and the horribly outdated version of closure-compiler in Debian. Unfortunately the latter is an important package for the Javascript ecosystem in Debian hence we can't just ditch it and it was too late for a major update because of the freeze. You can read more about the details and the discussion with Debian's release team in #1036249 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1036249. In the end those changes were approved and OpenRefine 3.6.2 is now part of Debian 12 "Bookworm".

I have updated several OpenRefine dependencies and OpenRefine itself. Here is a short summary.

Package upgrades

  • jetty9 to version 9.4.49 and 9.4.52
  • httpcomponents-core5 to version 5.2, 5.2.1, 5.2.2
  • httpcomponents-client5 to version 5.2, 5.2.1
  • openrefine-butterfly to version 1.2.4 and 1.2.5
  • libmarc4j-java to 2.9.5
  • libjsonp2-java to 2.1.2
  • libtitanium-json-ld-java to 1.3.2
  • libwikidata-toolkit-java to 0.14.6
  • openrefine to 3.6.1, 3.6.2, 3.7.4, 3.7.5, 3.7.6, 3.7.7

Security updates

  • libowasp-antisamy-java to 1.7.4. The package is related to libowasp-encoder-java and was affected by numerous CVE.
  • apache-jena to 4.5.0 and 4.9.0 to fix CVE-2021-33192, CVE-2021-39239,CVE-2022-28890, CVE-2023-22665, CVE-2023-32200
  • jsoup to 1.15.3 to fix CVE-2022-36033
  • openrefine to fix CVE-2023-37476 (Debian bug #1051429) and to fix CVE-2023-41886 and CVE-2023-41887 (Debian bug #1053461)
    These issues are fixed in all supported Debian releases including bullseye-backports now.

Backports:

I updated the packages in bullseye-backports, so that the versions match the ones in our current stable release Bookworm. I intend to continue to fix security vulnerabilities but I encourage all users to use either OpenRefine in Debian stable or the development versions in testing and unstable.

Misc

Have you considered to merge your class files into jars to improve the clarity of your binary distribution? At the moment the build produces hundreds of class files which are then installed into the /usr/share/openrefine directory. I believe it would help me to solve some minor reproducible build bugs like #1047753 and #1049586.

The odftoolkit issue has been resolved in the meantime. org.json:json is no longer non-free but was released into the public domain. It should be fine to upgrade to the next stable release 0.12.0 in the future.

What are your plans in regard to Jetty? Jetty 9 is no longer supported upstream. I'm currently pondering to package Jetty 11 for Debian. Would that work for OpenRefine too?

Best,

Markus

1 Like

Thanks for the update.

| apo
December 17 |

  • | - |

The odftoolkit issue has been resolved in the meantime. org.json:json is no longer non-free but was released into the public domain. It should be fine to upgrade to the next stable release 0.12.0 in the future.

That's good to hear. Hopefully public domain dedication will be acceptable to CS&S.

What are your plans in regard to Jetty? Jetty 9 is no longer supported upstream. I'm currently pondering to package Jetty 11 for Debian. Would that work for OpenRefine too?

I upgraded OpenRefine to Jetty 10 a few months ago. Jetty 11 has different package names, so is a breaking change for extensions. Because of this, it makes sense to bundle it with other breaking changes like an improved extension development kit. At the same time I did the Jetty 10 work, I put together a branch for Jetty 11 when we're ready for that move. (There's also an experimental Jetty 12 branch)

Is Jetty 10 acceptable for Debian?

Tom

Is Jetty 10 acceptable for Debian?

Jetty 10 would definitely be an improvement over Jetty 9 but the community support ends for Jetty 10 and Jetty 11 on January 1st 2024 according to upstream. Security support will officially end in 2025 when they most likely announce the full EOL for both versions. I assume there will be some effort to provide security updates after this date because of the widespread use of older Jetty versions but this is not certain.

Jetty 11 would be preferable because it is closer to Jetty 12 which makes it easier to backport security fixes. I have to think in long-term categories of five years of security support, so when the next Debian stable is released in 2025, it should be supported until 2030 at least.

We already ship Tomcat 10 in Debian which also uses the new Jakarta namespace, so there would be some synergy effects as well if we ship Jetty 11 or even Jetty 12. So if you could make the switch to Jetty 11 or Jetty 12 before 2025, that would be the best option in my opinion.

So if you could make the switch to Jetty 11 or Jetty 12 before 2025, that would be the best option in my opinion.

From a compatibility point of view, I don't think it's much more disruptive to go to Jetty 12, so if we need to go to at least Jetty 11, it probably makes sense to go all the way to Jetty 12 and not make extension writers upgrade twice. I'll refresh my branches to see how much work is involved.

Going back to your previous question about JARs vs .class files, the Mac kit build produces an openrefine-main.jar artefact (which we should probably add a version to), so I'm a little surprised that the Linux build doesn't do the same. Can you open an issue in the issue tracker for this to make sure we address it?

Thanks,
Tom

From a compatibility point of view, I don't think it's much more disruptive to go to Jetty 12, so if we need to go to at least Jetty 11, it probably makes sense to go all the way to Jetty 12 and not make extension writers upgrade twice. I'll refresh my branches to see how much work is involved.

I agree Jetty 12 would be the best option from a long-term security and community support standpoint. If that works, that'll be great.

Going back to your previous question about JARs vs .class files, the Mac kit build produces an openrefine-main.jar artefact (which we should probably add a version to), so I'm a little surprised that the Linux build doesn't do the same. Can you open an issue in the issue tracker for this to make sure we address it?

Done.It's issue #6257. Somehow I can't post a link to github. (link added by @staff )

| apo
December 19 |

  • | - |

From a compatibility point of view, I don't think it's much more disruptive to go to Jetty 12, so if we need to go to at least Jetty 11, it probably makes sense to go all the way to Jetty 12 and not make extension writers upgrade twice. I'll refresh my branches to see how much work is involved.

I agree Jetty 12 would be the best option from a long-term security and community support standpoint. If that works, that'll be great.

Actually, it looks like perhaps what we want to do is skip Jetty 11 and go straight to Jetty 12 because it offers multiple Servlet environments, potentially allowing us to decouple version updates and the API updates.
All the gorey details are here: https://webtide.com/introducing-jetty-12/
but the summary is:

Multiple environments can be run simultaneously on the same server and Jetty-12 supports:

  • EE8 (Servlet 4.0) in the java.* namespace,
  • EE9 (Servlet 5.0) in the jakarta.* namespace with deprecated features
  • EE10 (Servlet 6.0) in the jakarta.* namespace without deprecated features.

This would potentially allow us to move to Jetty 12, but stick with the Servlet 4.0 APIs as implemented in Jetty 10, perhaps preserving compatibility for extension writers. The move to EE9, EE10, or EE11 APIs could then be done independently.

Going back to your previous question about JARs vs .class files, the Mac kit build produces an openrefine-main.jar artefact (which we should probably add a version to), so I'm a little surprised that the Linux build doesn't do the same. Can you open an issue in the issue tracker for this to make sure we address it?

Done.It's issue 6257. Somehow I can't post a link to github.

Thanks!

Tom

1 Like

| apo
December 19 |

  • | - |

From a compatibility point of view, I don't think it's much more disruptive to go to Jetty 12, so if we need to go to at least Jetty 11, it probably makes sense to go all the way to Jetty 12 and not make extension writers upgrade twice. I'll refresh my branches to see how much work is involved.

I agree Jetty 12 would be the best option from a long-term security and community support standpoint. If that works, that'll be great.

Actually, it looks like perhaps what we want to do is skip Jetty 11 and go straight to Jetty 12 because it offers multiple Servlet environments, potentially allowing us to decouple version updates and the API updates.
[...]
This would potentially allow us to move to Jetty 12, but stick with the Servlet 4.0 APIs as implemented in Jetty 10, perhaps preserving compatibility for extension writers. The move to EE9, EE10, or EE11 APIs could then be done independently.

One potential issue with Jetty 12 is that it requires Java 17 as a minimum JVM. Our current minimum is Java 11 and we only dropped Java 8 support in November 2021, but Java 17 has been out two years and Java 21 has already been released, so perhaps it's time to stop supporting old Java releases for so long.

Tom

1 Like

One potential issue with Jetty 12 is that it requires Java 17 as a minimum JVM. Our current minimum is Java 11 and we only dropped Java 8 support in November 2021, but Java 17 has been out two years and Java 21 has already been released, so perhaps it's time to stop supporting old Java releases for so long.

FTR: Java 17 is the default JVM in Debian stable and we aim for complete Java 21 support next year at the latest. We will definitely release with Java 21 because of the long term support.

I'd say we definitely can stop supporting older Java releases. There's no longer a need to do that. Especially compelling for this also is that labs, universities, companies, users, can also use our Windows with Java embedded version. So agree, we can probably move forward faster along with Java's roadmap and releases.