Reproducibility project: December 2024 report

Here is a summary of what I have been up to last month on the reproducibility project.

The development work was focused on implementing user-facing features enabled by the earlier development effort. This means:

  • giving the user the opportunity to adapt column names in a JSON recipe before applying it to a new project. OpenRefine checks that all the columns required by the recipe are present in the project before running the operation and if some of them are missing, it lets the user select other columns to use instead. Similarly, it checks that the new columns created by the recipe do not exist in the project yet, so that applying the recipe doesn't fail because of conflicting names. It also gives the user the opportunity to rename created columns to avoid that.
  • offering a graph-based representation of a recipe to help the user visualize what it does. My hope is that this representation could eventually replace the textual (JSON) representation of recipes entirely. To the user, a recipe would just be a file which would get rendered in this graph-based fashion by OpenRefine.

Concretely, those two features are currently implemented in a new dialog shown after the user tries to apply a JSON recipe (and before the recipe is actually run). I'll open a separate post to get some feedback from some design questions I have encountered as part of that.

Like in the previous months, this activity wasn't visible through pull requests, because I have a growing backlog of commits to submit as PRs. I try to give ample time for any interested reviewers to chime in on those PRs, but then merge them after a couple of weeks without feedback. The current status of my work can be seen in the work branch of my fork. If anyone would be interested in reviewing my work, I would be interested to know what I can do to help them do that.

In January, I'll be working on adding support for operation icons (following the earlier design work in Operation logos - #18 by antonin_d). The icons should be used in various places of the UI, but should be particularly useful in the graphical recipe representation because embedding long operation descriptions in graph nodes is inconvenient.

1 Like

Here is a post requesting design feedback around column selection: