Reproducibility project: October report

Here is a summary of what I have been working on in the reproducibility project last month.

On the code review side of things, the current status is that @tfmorris asked that my branch be rebased on top of a recent version of the master branch. Because he also expressed interest having a normalized order for import statements in Java code, I have started work on this first, so that this can be normalized in the master branch and the rebasing can happen after that. So to go ahead with this I am waiting for Normalize Java import order by wetneb · Pull Request #6108 · OpenRefine/OpenRefine · GitHub to complete. Once that's in, it's still unclear to what extent the rebasing will be doable, but I am hoping that I can build an appropriate workflow to ease some parts of it at least.

In the meantime, I am continuing work on the introduction of columnar information in operations. I have written up a summary of my approach so far in this thread:

I have also explored a possible architecture for row-wise concurrency, making it possible to run two operations in parallel even if their columnar scope overlaps - as long as both of them are operating fully row- or record-wise. I will post a description of my approach for this soon.

My goal is that most of the internal architectural changes are in place by the end of the year, making it possible to focus on the user-facing features when the designer we are trying to hire comes on board. This hiring process should conclude soon, but whoever gets on board will likely need some time to get to know the project before they can dive in in user research and design of the interfaces.

2 Likes