Reproducibility project: May 2024 report

May was a quite short month for me given that I had a break, combined with the usual bank holidays, so I haven't got a ton of things to show for. I also got sick for a few days last week.

This was the last month of @zoecooper 's contract so my focus was on trying to help her finish up the design work on the reproducibility features. The goal was to shift away from a design primarily focused on producing a graphical representation of an operation history and put more emphasis on the ability to actually replay such a series of operations. I think we are not quite at the stage where things can be implemented but the effort has at least had the merit of starting conversations and getting concrete proposals in front of the community.

On the implementation side of things I have continued to submit pull requests of preparatory work, primarily around the Maven modularization this month (#6651), but also other small fixes that I can backport as such (#6648, #6645). I plan to abandon the scalability improvements (working on projects not fitting in RAM, incremental grid updates, parallel operations), given that I don't see how they could be introduced in an incremental way. Instead, we can introduce the reproducibility improvements as light improvements on top of the existing architecture.

The release of 3.8.0 and the breakage of the Commons extension with this new release brought back the problem of bringing clarity to our extension points and offering better stability guarantees. I have been working with @Sebastian and @tfmorris to fix the immediate problem but this also prompts longer-term discussions about how to go about this problem. I see fixing this as a rather important problem because it is difficult to make any significant changes to OpenRefine without this.

Therefore in June I plan to keep backporting some more changes from my existing reproducibility improvements but primarily shift the focus to this extensibility/stability problem. We'll also have the BarCamp, which might inform the future directions too.

Also, I plan to pause my work on this project this summer, with an unpaid leave of a few months. The intention is to let the PR backlog settle, easing the pressure on reviewers (so far exclusively @tfmorris) and also give more space for others to take initiatives. I'll stay around for GSoC mentoring and coordination with WMSE's development work.

As a side note (about which I'll open a separate conversation): because we could want to start the 3.9 release process this summer already, I wonder if someone else would be interested in taking over the release manager hat on that occasion. There is no particular pressure for it, I could still take care of it this autumn, which would include all the changes made over GSoC too.

1 Like

Thanks for the update -- and for splitting the PRs into reviewable sized chunks!

The intention is to let the PR backlog settle, easing the pressure on reviewers (so far exclusively @tfmorris)

For the record, I don't feel the need to be the sole PR reviewer, so I'd be happy to have others jump in and help! The current review load hasn't allowed much time to work on the features and bug fixes that I'm interested in.

we could want to start the 3.9 release process this summer already, I wonder if someone else would be interested in taking over the release manager hat on that occasion.

I'm happy to be the release manager for 3.9, but I'd also be happy to share the responsibility with an understudy if someone else wanted to learn the process.

Tom

p.s. I don't think that stabilizing the extension mechanism requires earthshaking changes, but I'll reply with more on one of the other threads.