Reproducibility project: March 2024 report

Here is a progress report of the reproducibility project in March.

On Zoe's side

In March, I had the pleasure of getting to know more members of the OpenRefine community, including Tom Morris and Esther Jackson. It was great to hear more about the history of the project, how OpenRefine has been useful in archival settings, and more. Always nice to put faces to names.

I also interviewed Thad and Steve in order to test out the new developments in Operation History and reconciliation. I have more user interviews scheduled this month, and am looking forward to getting more feedback from experienced OpenRefine users.

I’ve been thinking about how to make OpenRefine a bit more designer friendly, and requested a slight workflow change in Github. I think it will be helpful for designers to have a centralized place, or in this case a table, to find tasks that require design input. I make a point to check it often.

Regarding operation history design, I’ve been iIterating on the feedback shared in the forum here. I’ve also shared my progress on icons (thanks to all who gave their feedback!) and have moved from paper and pencil to iterating in figma.

I’m looking forward to speaking with more of you regarding feedback on the higher fidelity mockups and continuing to iterate and improve.

On Antonin's side

On the programming side of things, my work was concentrated on reorganizing my changes into reviewable chunks. The focus was on identifying as many non-breaking changes as possible that can be merged in master now, without further coordination. Concretely, this was primarily focused on:

  • Backporting Java tests that I have written for operations as I migrated them to the new architecture (primarily #6441 and #6466). This will help validate that re-implementing operations with the immutable data model does not change their behavior, at least in those simple test cases.
  • Preparing for the introduction of more granular Maven modules. This involves removing unnecessary dependencies between various parts of the code base (such as between clustering and GREL or between XLS support and project management) and migrating the registration of various sorts of components out of static Java code blocks into the controller.js (such as for exporters or for expression languages).

As part of this effort I have convinced myself that it's actually possible to introduce the more modular Maven setup seamlessly (meaning, without any necessary adaptation for extensions). I hope to be able to open a PR for that soon, once my preparation PRs are merged. This month I plan to try to also backport other changes:

  • Resizable colums
  • Stored errors for the Wikibase upload operation

I also want to try an extract any more meaningful preparation step for the change of architecture, just by inspecting the diffs and see if something comes to mind.

My development time was also shared with the 3.8 release process where I tried to address the most pressing needs (primarily fixing a severe bug introduced by adding support for recon errors, improvements to the new release notification, so that it can be used to announce our upcoming survey) and reviewing PRs from GSoC applicants.

All this seems to be quite far from reproducibility, you might say, and you'd be right! On that front I have been coordinating with Zoe on the design proposals for the undo/redo menu. Besides that I have been working on the design of the graphical representation of the history (which we currently call "pipeline"). To do so, I am trying to get away from the more theoretical standpoint I have had so far (designing how each operation should look like in isolation) and instead try to draw the graphical representation of concrete lists of operations that people actually use. The intention with this is to identify whether the pipeline view is successful in conveying relevant dependency information between the operations and whether we run into unexpected layout issues (too many columns? too many operations? not enough room to display this particular element). As part of that I have been following some OpenRefine tutorials and see what the resulting pipelines look like. I also used this opportunity to suggest some improvements to one of the programming historian lessons in passing. I plan to do more of that this month.