Reproducibility project: August report

Here is a quick update on my progress on the reproducibility project.

As expected this was a relatively quiet month since I took some holidays, and I had to keep the lights on in Outreachy mentoring and other project duties. But I started work on re-organizing the commit history of the 4.0 branch to make it more logical and easier to review. It's an interesting challenge which pushed me to improve my git skills. I have been trying various approaches to see what works best.

In a nutshell, the 4.0 branch differs from master by about 500 commits. It branched off 3.5 years ago and I have merged master into it every couple of months (although the frequency was not very stable). When doing the first merge, I considered rebasing the branch on top of master instead. This generated much more work to solve merge conflicts, because a single change in master could generate rebase conflicts for many different commits on my branch. So it felt like I was re-doing a lot of work needlessly. Merging was much simpler, in the sense that every conflict is solved only once, so I went for that. It also made sense to keep the commit ids stable, as my branch was published early on in the official repository.

Although I tried to keep a clean and logical commit log, the order of the commits is not very logical. This comes from the fact that I often discovered some problems about my changes after a while, when my work had already moved on to a different area. This is the problem I am trying to solve now: I am working on re-arranging the commit history so that the branch is organized in logical chunks of commits that can be reviewed independently. In those chunks I am also squashing some commits together when they are clear "fixups".

Concretely, I am rebasing with the --rebase-merges option, which lets me rebase my branch without discarding the merge commits with master that are in it. When re-ordering or squashing commits that are not separated by any merge commit, I can avoid most merge conflicts by reusing their resolutions from existing merges. When the commits need to move across merges, this is a bit more involved since the existing merge commits cannot be reused as-is in the rebased branch, but git's rerere feature is very helpful in reusing the resolutions that are unchanged.

My target end state is a branch where I can point to a couple of commits as meaningful intermediate states to review the work, ideally located near merge commits. I should then be able to open a series of PRs which build on top of each other (which can be done by introducing ad-hoc branches) so that they can be reviewed in GitHub. They would not be meant to be merged as such: instead, the tip of the master branch would be set to the tip of my branch after a last merge. Before that, I would address PR comments by a similar rebasing.

@tfmorris @abbe98 let me know if that sounds reasonable to you.

1 Like

thanks for the hard work you do for all of us! Just wanted to express that.

2 Likes