Reproducibility project: February 2024 report

antonin_d · March 5, 2024, 9:28am

Here is a progress report of the reproducibility project in February!

On Zoe's side

Aside from being sick for a week, I made progress on sketching the UI for operation history. Much of the design thinking, sketching, and discussing revolved around the best ways to visually (and textually) signal to users which subsequent operations would be affected by potential changes (deletion, necessary recalculation, etc), and how each step may relate to others in the history. Sketching sessions helped me think through the complexity and experiment with different visualization solutions, both in terms of how Operation History might appear in the overall interface, and the look and UX of the Operation History window itself.

We landed on two different approaches to the design to test: one which relies more heavy on text-based warning panels which let the user know the consequences of a potential change to the history, the other relies more on visualization and color in order to indicate these possible changes to the user. As we move into the interviews this month and show these designs to more people, I look forward to learning from their feedback.

Another aspect of the interface I worked on were operation logos (or icons, or symbols as we might refer to them). I’ve been developing logos for each type of operation that could sit next to it in the history. This idea arose from the desk research phase in the previous month, which help me think more creatively and broadly about different UX metaphors intended to help with usability. Here’s a link to the first iteration of the sketches, I’d love more feedback on them. They’ve been posted in both the forum and GitHub for maximum visibility within the community.

I’ve also been thinking about the way we invite designers into OpenRefine, and how we might make the documentation more clear for designers who are new to the project and new to open source work (GitHub, etc).

As I look ahead I’ve been thinking about future design projects I’d like to work on with OpenRefine and have shared them here:

Two GitHub issues that came up and sparked conversation:

github.com/OpenRefine/OpenRefine

Checkbox UX

opened 11:28AM - 08 Feb 24 UTC

cooperzoe

Type: Feature Request Theme: UX/Usability

In the initial data import phase, when I go to specify the number of rows I'd li…ke to load (or discard), I notice that I need to both fill in the number in the blank space and check the box. If I don't check the box, the limitation I set won't work. From a UX perspective, this feels redundant and unclear. I think it would be more intuitive for the user to have that limitation automatically confirmed upon entering the number. <img width="1233" alt="Screenshot 2024-02-08 at 12 27 27 PM" src="https://github.com/OpenRefine/OpenRefine/assets/52141007/7e982429-2fb3-4f1c-8331-46adbf32c690">

github.com/OpenRefine/OpenRefine

OK & CANCEL button placement

opened 11:32AM - 08 Feb 24 UTC

cooperzoe

Type: Feature Request Theme: UX/Usability design proposal needed

In the newest version of OpenRefine, I notice the OK button is on the left hand …side and the CANCEL button on the right in this window. I expected the OK button to be on the right hand side and found it's unusual placement counterintuitive. I propose we either switch their placements, or put OK in the righthand corner and keep cancel next to it. If we choose the latter option, perhaps we could make the OK button blue, so they don't look too similar to one another. <img width="1177" alt="Screenshot 2024-02-08 at 12 31 38 PM" src="https://github.com/OpenRefine/OpenRefine/assets/52141007/bd5e1f3b-0684-4b57-9db4-13e56c6ab3ba">

On Antonin's side

This month, my work on the reproducibility project was split between three main tasks:

Addressing bugs discovered during interactive testing by Zoe or myself as a preparation for our first testing campaign.
In this first testing campaign, we plan to ask experienced OpenRefine users to go through a sample data cleaning task with us, working with OpenRefine from the 4.0 branch. The goal is to observe their reaction to various preliminary changes to our reproducibility improvements: handling of partial results of long-running operations, new process panel, ability to run operations in parallel, and so on. We also anticipate they will discover more bugs during this testing.
Work on restructuring the commit structure of the 4.0 branch, following the approach proposed in the November 2023 report.
This primarily consisted in refactoring in the 3.x test suite, to align the structure of the tests that of the 4.0 test suite. See the corresponding pull requests: #6371, #6383, #6388 and #6389. In parallel, I have done similar work on the 4.0 branch, comparing the test suites to identify any new test case I added.
Participating in the design of the history UI changes, together with Zoe. For now we are working on UI mock-ups which aim at relatively small changes to the existing history tab UI, attempting to add support for making various sorts of changes to the list of operations (see below).
For now, we are not actively looking at a way to integrate a graphical representation of the operations list nor making changes to the Extract/Apply functionality. My intuition is to try to push for incremental changes to make sure we have the capacity to deliver relatable and implementable proposals first.
But the order is debatable. Perhaps we should have started with the graphical history representation instead, and have added those new features on top of this new representation instead. By putting those features first, my hope is that it also makes it clear to the broader team what user needs we are trying to address, as the benefits of a graphical history representation are likely less tangible.

Here are the interactions we are trying to enable with the history tab:

Deleting an old operation without discarding the following operations (#183, #369, mailing list thread).
Re-computing an old operation without discarding the following operations (#655).
Changing the settings of an earlier operation, again without discarding the following operations (no issue yet as far as I know).

Those are all things a user could want to do on a particular history entry. Are there other such actions we should have on our radar? How would you prioritize them?

Internally, the three ones listed above pose the same sort of challenge in the backend: one needs to be able to detect the potential effects of the action on the grid, and determine which of the future operations can be kept. This will rely on the columnar metadata exposed by operations in the new architecture, letting the backend enforce certain guarantees about which parts of the grid are touched by the operations.
For instance, if the user deletes an operation that creates a new column, any future operations that make edits in that column would also get deleted in the same go, but operations making edits
to other columns could be preserved. Similarly, if the user deletes an operation that made changes in a column, any future long-running operation which depends on this column will need re-computing since its input data will have changed. I am therefore working with the assumption that when the user requests the deletion of a particular history entry, the backend will be able to produce a list of operations that will need re-computing or will be discarded. This would be produced on a best-effort basis: by default, in the absence of sufficient columnar metadata, the backend would fall back on discarding as many future operations as needed. The question of how to convey those potential effects to the user was central to this month's design work.

Apart from that, the workload on general OpenRefine development was noticeably higher this month, with the release of 3.8-beta1 and the preparation for GSoC, primarily.

Topic		Replies	Views
Reproducibility - Operation History Development & Design	6	183	January 9, 2024
Operation history mockups Development & Design	1	96	June 4, 2024
Reproducibility project: March 2024 report Day-to-day project operations	0	128	April 8, 2024
Reproducibility project: January 2025 report Day-to-day project operations	0	26	February 3, 2025
Reproducibility project: January 2024 report Day-to-day project operations	2	151	February 6, 2024

Reproducibility project: February 2024 report

On Zoe's side

On Antonin's side

Related topics