Reproducibility project: January 2024 report

Here is an overview of what we have been doing on the reproducibility project in January. This month, I am integrating a report from Zoe in this post too.

Zoe's report

To start off the month and get to know OpenRefine better, I worked on an exercise using the data from the Reporters Without Borders award page. I practiced splitting, reconciliation, and schema creation for Wikidata entry. As I’m new to the tool - and new to Wikidata - it was helpful to work through this exercise and see what questions emerged for me. I got to know how to think about statements, schemas, and more.I also found Wikimedia’s own materials helpful to look through.

As I went through the exercise, I tried to take advantage of my fresh eye to the tool to identify places where the design feels clunky and unintuitive, particularly to non-technical users like myself. As I worked on the exercise, I’d keep note and create GitHub issues as they emerged. Here’s an overview of the Github issues I raised with the community:

  • Place 'edit' and ‘remove’ functionalities within the same menu tier #6280

  • Renaming 'View' in menu #6281

  • Undo/Redo tab button #6287

  • Opening a project in project list #6288

  • Calendar text #6300

  • Entering Columns names in Wikibase schema fields #6301

  • Removing rows #6302

  • Refresh when working in schema #6303

  • Unfolding triangle in Wikibase edit preview #6304 (this one has been picked up!)

  • Uploading edits in Wikibase preview #6305

  • Make it clear that summary is mandatory for Wikibase upload #6306

  • Improve renaming columns #6282

And one pull request (pull #277), making slight improvements to the design documentation.

As I shared in the forum here, I also jumped into the research phase of the Undo/Redo visualization project by deep diving into desk research. To begin, I took a look at how other tools have designed their operation history function, including tools for editing very different kinds of documents (photo editing tools, writing composition tools, spreadsheet tools). This process helped me gather inspiration - and reveal my assumptions - when it comes to working with operation history.

I took notes on my desk research and shared them in this google doc.

Antonin's report

This month I was focused around preparing for a user testing campaign. The goal of this effort is to evaluate the preparation work I have been working on over the past year, which covers the following changes from a user perspective:

  • displaying partial results of long-running operations
  • crash recovery for long-running operations
  • new UI for long-running processes (as a dedicated tab)
  • concurrent long-running operations

Per se, those are not features that are directly improving reproducibility, but they are internally related because they also relate to identifying which parts of the grid are touched by a given operation. This is necessary for:

  • validating that a series of operations can be applied on a dataset, by having a uniform interface to determine which columns are read and touched by operations.
  • determining which operations need to be recomputed or discarded if an earlier operation is undone
  • offering a graphical visualization of the dependencies between operations

To prepare for this testing campaign, I have come up with a small exercise to be proposed to our testers, asking them to go through a sample data cleaning project where the changes above are visible. The goal is to observe their intuitive reaction to those changes. For this first testing campaign, we plan to ask users who are already familiar with OpenRefine, so that we get a sense of how our existing user base will react to those changes.

Beyond drafting the data cleaning exercise, most of my development time went into fixing bugs that Zoe and I have encountered ourselves when going through the exercise. This covered the following topics:

  • unclear progress reporting on slow operations or large datasets (the progress will stay at 0% for a long time, even if there is actually some progress, so as a user it is unclear if it the operation is stuck or still working)
  • some bugs related to the isolation of column dependencies of operations, for instance a reconciliation operation taking the values to reconcile from the wrong column
  • some bugs related to synchronization between concurrent operations via the WatchService, because I was not using WatchService correctly
  • some other bugs which were actually present in our master branch, for which I opened issues (#6326, #6286, #6328, #6329, #6330)

In parallel to planning this testing session, I have also worked with Zoe on the design of the actual reproducibility improvements, with changes to the undo/redo tab to come first. Although we can potentially make really wide changes to a lot of things, I am intuitively in favor of going for smaller, focused changes first.

I also did another synchronization with master (merging master into 4.0), so that this testing round can also bring up some feedback on the changes that will make it into 3.8. Finally, I prepared a presentation about the project and presented it at FOSDEM. You can see the video here:

This was the occasion to ask interested people to sign up for our testing campaign, which you can also do here:


@zoecooper, if you need help connecting with contributors and users for your research, I can introduce you to people I interviewed last year or help you identify relevant contacts to reach out to directly.

1 Like

@Martin great, thank you!