Thanks Tom - I would definitely do some things differently if I had to start this effort again.
I think one major mistake I have made is that I kept all operations, facets, importers and exporters around while implementing the various drafts of this new architecture. Migrating all of those was a major effort, and for early architectural experiments it was not necessary to have them all available. Instead, I could have deleted all of them but a few (say, only keep the CSV importer, the "add column based on this column" operation, the URL fetching operation, the text facet and no exporter at all) and just try different architectures in this setting.
I am also not happy about the time it took to get to a point where we can ship something to users. It's a long leap of faith and a high risk for the project. It is also stressful for me for sure. That being said I do not want us to simply go for it because of the sunk cost. I count on your high standards and your care about the project to judge this new architecture by its merits. We still have time to make major changes if needed. I am still very grateful that you questioned the coupling to Spark a few months into the project: I had already invested a lot of time into that, but I am glad I listened to you and pivoted away from it.
That being said I don't really understand why you are saying that the 4.0 branch includes lots of orthogonal stuff. What are you thinking about? As I wrote earlier, I think I should have avoided the column resizing feature (which I thought was necessary for a good UX, but it was indeed a stretch), but the other bullet points I have listed above are really dependent on the new architecture from my perspective:
- the displaying of partially computed operations: I think it would be a really messy thing to implement in 3.x. You'd likely end up with long-running processes mutating the grid as they compute the operation, which would be a bit of a nightmare to ensure correctness of and keep undo support.
- the new UI for the process queue: I guess you could implement that UI in 3.x quite easily, but I don't think it makes a lot of sense on its own given that queuing processes is not really usable in 3.x (since the grid is not updated as long as the first process in the queue is running)
- pausing and resuming operations: that could also be done, but it is not very useful if we are not displaying the partial results I think. Also it would have to rely on the cooperation of the
LongRunningProcess
to regularly check if it should pause: that would need to be implemented for each operation independently.
- restarting operations after restarting OpenRefine: implementing that in 3.x feels really hard and hacky to me
- partially computed facets with the newly introduced cap on rows: validating the UX in 3.x could be done, but again it's a feature that mostly makes sense for large projects which 3.x is not very good at handling anyway, so I am not sure what conclusions you can really draw from such an experiment.
- the error handling of the Wikibase editing operation. The design for this came to my mind because I was forced to rework the Wikibase upload operation to fit the new architecture. It is true that once the design is found, there is nothing preventing us from implementing it in 3.x. (although it's some work to backport it).
Concerning performance on small datasets: I am glad we agree this is crucial.
The thing is: this sort of performance really depends on users' workflows, so to benchmark it meaningfully I think it really helps to have all features available in the prototype. I don't think users are very enthusiastic to try out a version of OpenRefine where they cannot carry out the workflows they are used to because a key operation they rely on is missing. It's more than that: I think users are not very motivated to try out an alpha version of a new architecture which is just on-par with the features of 3.x, but does not yet include any of the user-facing benefits of the new architecture beyond scalability of the backend. So that's why I thought it was worth working on support for partially computed operations: I think users will see the point and will be more keen to try it out.
I also tried to avoid premature optimization. There is no point spending months of efforts profiling the architecture to make sure it is always faster than 3.x if we are not sure this architecture can actually deliver the features users need.
So: yes, in its current state, the 4.0 branch will be behave worse than 3.x in some situations. That's why I want to ship an alpha version to get more feedback about what those situations are.