OpenRefine Groovy extension - builtin yes please?

I'm at the point where I'm pretty tired of custom building OpenRefine nearly daily (because I also review PR's and run tests on Windows and use Groovy expression language as also part of the rigor) just to get the Groovy extension to init() because of some weird Butterfly issue or Rhino's magic object which I'm not 100% sure Butterfly is handling well now. Initializes but does not register in OpenRefine as an expression language · Issue #2 · thadguidry/refine-groovy · GitHub

We had spoke before of perhaps rolling in the Groovy extension as a built-in instead of in my separate repo. I'd like to spark that conversation up again. I've seen where OpenRefine is being taught more and more (we've had teachers and professors ask about GREL training videos, for example). In light of that GREL is not always the first at hand language that folks know, many more are knowledgeable of Javascript, Python, Groovy, and Ruby. We already have Python as a builtin, and I'd like to see us go 1 more step and get Antonin's Javascript and Thad's Groovy extensions also built-in to OpenRefine. I'd then work on the Ruby extension as prior explained Ruby as an expression language

Any comments, or disagreement with sound reasoning, for adding the Javascript and Groovy expression languages as built-in extensions?

1 Like

Here are some general thoughts on this topic. But I would consider none of them a blocker for your specific question.

  1. I currently prefer GREL because of it's stable interface. With Python for example I regularly would have to rewrite a lot of my examples and recipes to reflect language changes and resulting best practices from these language changes.
  2. For non native JRE languages like JavaScript and Python we should imho revisit the integration of GraalVM. As commented in another thread you can include GraalPy with Maven on native OpenJDK without having to switch to GraalVM as runtime environment. But this was already discussed in Add support for Python 3 as expression language · Issue #2249 · OpenRefine/OpenRefine · GitHub and `Edit cells > Transform > Language` support for R · Issue #1226 · OpenRefine/OpenRefine · GitHub.
  3. If we want to offer our users more language support, we should not only increase quantity, but also quality. Meaning for example some basic code editor support like Ace, CodeMirror or Monaco (see Add syntax highlighting to transform textarea · Issue #153 · OpenRefine/OpenRefine · GitHub and Add autocomplete to Transform expression dialog · Issue #684 · OpenRefine/OpenRefine · GitHub).

As these topics already have been discussed on GitHub I mainly add them as reference here in the forum to improve their visibility.

3 Likes

Expression languages are generally quite heavy (in terms of storage space taken up by the install), for instance Jython weighs 49 MB on its own, so it's worth being cautious about including more of them by default.

Another problem is that they so far lack OpenRefine-specific functions like cross or facetCount, which still forces people to learn GREL if they want to use these. The refine-js extension I made suffers from this problem, but also many more, since JS doesn't include a lot of utility functions by default, and people generally rely on small NPM packages to complement it (but there isn't a way to install NPM packages with this extension). So it's kind of useless, it was more meant as a proof of concept.

It's probably quite subjective, but Groovy doesn't strike me as a particularly well known language for many folks in our user base. Also, including it by default will likely cause difficulties in our downstream packaging for Ubuntu/Debian, because only version 2.4 is packaged currently (because of its inter-dependencies with Gradle, which is very hard to package for Debian).

Oh, that's a very good point about downstream packaging, and quite rightly so, some of our custom OpenRefine deeper functions. I hadn't thought of that. Hmm, in light of that, I'm going to park this discussion then.

I'll bear my own personal burden and continue my daily builds. But boy it would be great to finally figure out the real issue in the packaging.

I think GREL is more of a barrier for our non-developer user base as it introduces a lot of new paradigms specific to programming in general. The closest reference point for them would be a spreadsheet formula, not a programming language. From my experience, users who already know programming ramp up quickly on GREL because it is just another language.

1 Like