Thanks for doing this and apologies for the delay in reviewing. It seems like the base branch for the whole series of PRs is your wetneb/OpenRefine:4.0-point-0, but when I look at that it says that it is 3239 commits behind the current master branch. I was hoping that the base for the PRs would be the current code base so that it's a) something familiar and b) something that we can actually apply the PRs to as they are approved.
I looked at the PR which is supposed to be equivalent to the draft PR with the package renaming that I put together and transferred some of my comments over to it. I'll try and look at some of the rest to better understand the overall structure, but the starting point is a rather fundamental question.
Tom
Thanks for having a look!
Yes, so far I have kept the original branching point, because rebasing such a long list of commits on top of master did not really seem doable so far. Because I have been making regular merges, my plan would be to just do one final merge at the end. It leaves merge commits in the master branch, but I don't think that's a big problem (we have tons of them inherited from the times before we switched to squashing PRs and it hasn't been a problem as far as I can tell).
Maybe it's actually doable to do a big rebase on top of the current master branch, if git's rerere tool works well enough to avoid having to solve the same conflicts over and over. Or by squashing my commits more aggressively (for instance, one commit per PR that I opened on my fork).
For reviewing purposes I do think the somewhat dated version of the code base should not make so much difference, since the changes are mostly orthogonal to the rest of the changes the code base has seen since (which is what has made merging doable in the first place).
In any case, I don't think it would be appropriate to merge the PRs one by one, given that the intermediate points between them are rather unfinished states that will make it difficult for others to work on other PRs.
Because I have been making regular merges, my plan would be to just do one final merge at the end.
I don't think that's a viable strategy because it basically inverts the relationship between the branches and makes the 4.0 branch the master. There's too much chance of missing something on a merge that large.
This is an example of the danger: https://github.com/wetneb/OpenRefine/pull/334#discussion_r1355651488
In a commit which is nominally about reorganizing things, functionality has been removed. This was hidden in a PR of just a handful of commits and was almost missed by me. There's almost no chance that reviewers will find everything in a PR representing years of development.
A much safer and more reviewable way to do this is to carve out individual PRs which contain specific features or refactorings, get them reviewed, then merge them into a 'next-gen' branch which can be built up from the current HEAD of the mainline. There are probably other ways to organize this, but I don't think just blindly merging a giant PR containing years of unreviewed work is one that's tenable.
Tom
We should be adhering to our commitment to trunk-based development. Gitflow style development is largely discouraged now. Gitflow Workflow | Atlassian Git Tutorial and specifically Branch by Abstraction - Trunk Based Development
By writing "one final merge at the end" I do not mean that the changes should be reviewed in one batch: the whole point for my recent restructuring of this branch is to make it possible to review the changes in independent steps. So that does not change much in terms of reviewing, apart from the review of the resolution of merge conflicts.
As I wrote above I'll look into rebasing the entire branch on top of a recent version of master.