Releasing 4.0-alpha2

I propose to release 4.0-alpha2 soon, finally!

The goal of this release would be to get the new architecture tested more broadly, and gather feedback about various new user workflows:

  • the displaying of partially computed operations
  • the new UI for the process queue
  • pausing and resuming operations
  • restarting operations after restarting OpenRefine
  • error handling of the Wikibase editing operation
  • column resizing
  • partially computed facets with the newly introduced cap on rows

and maybe other things I am forgetting.
To gather this feedback, I am thinking about various options:

  • letting users book 30 min video calls with me to show me what problems they encounter
  • running a survey with questions about the features listed above
  • in the release notes, pointing to a dedicated thread in the forum for feedback

What do you think?

I would like to sync up with master before a release and I might push more bug fixes as I keep testing this version on my side.

2 Likes

Is the 4.0 scope documented somewhere? Is it "only" the items labeled on Github with the 4.0 tag?

1 Like

I think the scope of 4.0 is very much up for discussion. I think in terms of user-facing features, we have more than enough to make a release. So far this has been mostly a solo effort but I think we have reached a stage where I would like to invite others to contribute directly in 4.0, especially if introducing a breaking change for extensions (so that 3.x stays as stable as possible and becomes more legacy). By inviting other contributors there, it means being open to including other things in the release scope.

As far as I am concerned, the main points that are holding the release back (in my mind) are:

  • performance of the new architecture, not just for big datasets but also for small ones. Because the new architecture does imply that some workflows on smaller datasets can be slower than in 3.x, and I want to avoid this as much as possible. This means introducing the necessary optimizations and settling on an appropriate caching strategy. I am hoping that the testing on this alpha release will help with that.
  • refining the UX changes introducing by the new features. Again, testing should help.
  • aiming towards some stability for extensions. Because so much has changed and it will be a significant effort to migrate extensions, I intuitively want to make the migration worth it for extension maintainers by offering an extension surface that is as stable as possible. Maybe this means that a stable 4.0 should be based on a new version of Butterfly, based on a recent Jetty and with the declarative mechanism to connect to extension points as discussed in #5664.
1 Like

I'm very interested in many of the existing improvements to 4.0 as-is and I could also see myself contributing directly especially given that some areas would benefit from some breaking changes but also for 4.0 specific UX needs.

That said it would be nice to have an idea of when one would be able get return on involvement in 4.0 development(in my case when would 4.0 development move to master), otherwise it becomes hard to motivate.

I think the scope of 4.0 is very much up for discussion.

[...]

As far as I am concerned, the main points that are holding the release back (in my mind) are:

  • performance of the new architecture, not just for big datasets but also for small ones.

This is the key concern that I have.

The other is that the effort has grown from an architectural prototype/spike to include lots of orthogonal stuff which has nothing to do with the architecture experiment. Rather than a series of incremental features which can be evaluated independently it's grown into this "big bang" effort which is all or nothing. And, because it will represent N years of investment, there will be inexorable pressure to release it regardless of fitness for purpose. It already seems like a fait accompli without any evaluation having been done.

Tom

Thanks Tom - I would definitely do some things differently if I had to start this effort again.

I think one major mistake I have made is that I kept all operations, facets, importers and exporters around while implementing the various drafts of this new architecture. Migrating all of those was a major effort, and for early architectural experiments it was not necessary to have them all available. Instead, I could have deleted all of them but a few (say, only keep the CSV importer, the "add column based on this column" operation, the URL fetching operation, the text facet and no exporter at all) and just try different architectures in this setting.

I am also not happy about the time it took to get to a point where we can ship something to users. It's a long leap of faith and a high risk for the project. It is also stressful for me for sure. That being said I do not want us to simply go for it because of the sunk cost. I count on your high standards and your care about the project to judge this new architecture by its merits. We still have time to make major changes if needed. I am still very grateful that you questioned the coupling to Spark a few months into the project: I had already invested a lot of time into that, but I am glad I listened to you and pivoted away from it.

That being said I don't really understand why you are saying that the 4.0 branch includes lots of orthogonal stuff. What are you thinking about? As I wrote earlier, I think I should have avoided the column resizing feature (which I thought was necessary for a good UX, but it was indeed a stretch), but the other bullet points I have listed above are really dependent on the new architecture from my perspective:

  • the displaying of partially computed operations: I think it would be a really messy thing to implement in 3.x. You'd likely end up with long-running processes mutating the grid as they compute the operation, which would be a bit of a nightmare to ensure correctness of and keep undo support.
  • the new UI for the process queue: I guess you could implement that UI in 3.x quite easily, but I don't think it makes a lot of sense on its own given that queuing processes is not really usable in 3.x (since the grid is not updated as long as the first process in the queue is running)
  • pausing and resuming operations: that could also be done, but it is not very useful if we are not displaying the partial results I think. Also it would have to rely on the cooperation of the LongRunningProcess to regularly check if it should pause: that would need to be implemented for each operation independently.
  • restarting operations after restarting OpenRefine: implementing that in 3.x feels really hard and hacky to me
  • partially computed facets with the newly introduced cap on rows: validating the UX in 3.x could be done, but again it's a feature that mostly makes sense for large projects which 3.x is not very good at handling anyway, so I am not sure what conclusions you can really draw from such an experiment.
  • the error handling of the Wikibase editing operation. The design for this came to my mind because I was forced to rework the Wikibase upload operation to fit the new architecture. It is true that once the design is found, there is nothing preventing us from implementing it in 3.x. (although it's some work to backport it).

Concerning performance on small datasets: I am glad we agree this is crucial.

The thing is: this sort of performance really depends on users' workflows, so to benchmark it meaningfully I think it really helps to have all features available in the prototype. I don't think users are very enthusiastic to try out a version of OpenRefine where they cannot carry out the workflows they are used to because a key operation they rely on is missing. It's more than that: I think users are not very motivated to try out an alpha version of a new architecture which is just on-par with the features of 3.x, but does not yet include any of the user-facing benefits of the new architecture beyond scalability of the backend. So that's why I thought it was worth working on support for partially computed operations: I think users will see the point and will be more keen to try it out.

I also tried to avoid premature optimization. There is no point spending months of efforts profiling the architecture to make sure it is always faster than 3.x if we are not sure this architecture can actually deliver the features users need.

So: yes, in its current state, the 4.0 branch will be behave worse than 3.x in some situations. That's why I want to ship an alpha version to get more feedback about what those situations are.

| antonin_d
June 28 |

  • | - |

That being said I don't really understand why you are saying that the 4.0 branch includes lots of orthogonal stuff. What are you thinking about?

Perhaps I have a mistaken impression. In addition to the column resizing, I thought I saw a generic command error handling fix that was targeted for the 4.0 branch (only) recently. Perhaps it was this:

  • the error handling of the Wikibase editing operation. The design for this came to my mind because I was forced to rework the Wikibase upload operation to fit the new architecture. It is true that once the design is found, there is nothing preventing us from implementing it in 3.x. (although it's some work to backport it).

If that's an isolated example, then it's not a big deal.

I struggle to wrap my head around how anyone is going to effectively review the code since it amounts to almost 500 commits and multiple years worth of work. Do you have a strategy in mind?

Tom

Yes, you are right that the generic command error handling is orthogonal too. The reason why I did it in 4.0 is that I have been testing the new architecture quite thoroughly and regularly run into backend errors that were not exposed by the frontend at all. It felt quite frustrating to debug in those conditions (for instance with the "Working..." screen staying up even though the backend had actually responded with an error).

Although this change was really overdue and totally deserved to be in 3.x as well, I think this is a pretty risky move. Because the frontend has been built around the assumption that all commands return HTTP 200 in all circumstances, changing that means we are running into the risk that some XHR callbacks don't get executed when they should, and that spurious errors are reported to the user. That would for instance happen if some command throwing an exception is actually "fine" for some user workflow. I have tried to review all places where we do XHR requests to the backend and avoid this situation, but obviously there are a lot of those so I think it's quite likely I missed some corner cases. It's also one reason why I have put this off for such a long time, I guess.

That's one rule of thumb I have used for similar changes: if it's a chore we need to do which has the potential to break many things or require changes in extensions, do it in 4.x rather than 3.x. The goal being that 3.x remains very stable and that riskier changes get a chance to be tested by more accepting users as part of the broader evaluation effort for 4.x. It's a debatable choice for sure. It is in good part motivated by previous experiences (such as the JSON migration introducing various bugs and extension maintainers complaining about the frequency of breaking changes).

About reviewing the changes: I agree that the scale of the changes makes it impossible to review diffs directly. I wish I had had more reviews progressively. I have tried to offer more high-level summaries of the paths I was taking to give an opportunity to review the architecture as a whole, rather than the nitty-gritty code details. But overall I have not encountered a lot of interest. When I started working on this, you were essentially the only one who engaged with my work, and given your scarce availability it was not doable to get every commit reviewed - let alone architectural choices. So I have got used to be essentially alone in working on the backend. It's not a good habit to have, for sure. I hope we can get to a saner situation once/if this becomes mainline.

I think it would be useful if a draft pull request was opened from 4.0 to master, it would make the work more transparent and accessible for comments.

Let's try it - I am not sure how workable it will be for people to comment in this form, but it does not hurt to try.

Me neither, but if nothing else it could highlight issues that might come up anyway if one moves to merge it with master.

It's a good idea, but with 488 commits (of which Github will only show the most recent 250) along with1817 files and half a million lines changed, I'm not sure it makes things any more reviewable. In my ideal world this would be some small(ish) number (12? 20?) of feature/refactoring branches which could be reviewed mostly independently in manageable chunks.

For example, the first thing I noticed at a glance is a large namespace change and directory reorganization which pollutes the entire diff, but can probably be, if not ignored, at least reviewed separately. I started a branch to see if I can at least get some of the gross changes out of the diff, but I haven't checked yet to see if it helps.

Tom

1 Like

I agree and I think your attempt at separating the namespace change is a very good one especially given that it might bring down the diff.

Feature branches that comes to mind to me includes the recent error handling/HTTP status codes as well as the column resizing. I think both of these could also go in 3.*.

Here is a report on this front. I think breaking down the changes in reviewable diffs will be very difficult.
Even without the namespace change, the changes to the core classes like Project, History, Row, Cell generate way too much noise on their own to make reviewing those diffs palatable, I would say.

What I think makes more sense is to have really clear (but concise) descriptions of the change at a high level, and review those instead. Because I think there is a lot of overlap between such descriptions and a migration guide for extension maintainers, I have spent more time working on this. Here is how it currently looks like:

Note that at the moment all this is integrated as "migration guide", but I think some of the content should rather go into a general documentation of the architecture and be refered to by the migration guide.

My approach is to have a go at migrating some extensions myself and writing down the changes I am faced with as I go. I have started doing this for the CommonsExtension and am looking into doing this for other ones as well. It's killing two birds with one stone: it's helping me write a coherent description of the changes and it's an occasion to do a migration which will be useful for the extensions themselves.

In the meantime, I think this does not prevent from getting more testing hands on the 4.0 branch, so I am gathering a small panel of testers who I think will have some interest in the changes and could be willing to take this version for a spin. It'd be convenient to have an official alpha release for that, but I can also make custom snapshots independently.

I took a look at the new error handling to see how easy it would be to review a set of changes(maybe not the most relevant changes but it's some changes I'm very interested in and affects some parts of OpenRefine that I'm familiar with).

I found it to be very hard to review as it's split across many commits and sometimes with unrelated commits in between. I found that it's sometimes tricky to know if something broke because of one particular change or something unrelated on the 4.0 branch.

Continuing with the error handling example; there are some things I would normally want to comment on and request changes regarding but it's way more work than if it would be on a feature branch. I also believe that separate branches would make it much easier for you to get early feedback on changes before spending more time than necessarily on a particular change, I would for example have brought up issues regarding the "catch all" error handling in the frontend as soon as I saw a pull request.*

* Off topic but to not leave you hanging: It removes our ability for error recovery and custom handling while also blocking the main thread.

I think that these changes deserves the same level of scrutiny as other changes we merge.

That's one change that I should be able to isolate as a PR, and I'd be very happy to have your feedback on it.
But I am skeptical about backporting it to master, because:

  • it's a risky change indeed, and one that should be tested thoroughly before it makes it into a stable release
  • it has some potential of breaking extensions or third-party clients (CLI…). Granted, we always state that the HTTP API is internal and does not have stability guarantees, but for extensions themselves it does have an impact (since they are also affected by the ajaxError event handler).
  • its behaviour is somewhat dependent on some broader refactoring work in the backend which introduced specific classes of exceptions (to avoid relying on blanket Exception catching at the command level). I don't see how this sort of work can really be introduced atomically, for me it comes from a general awareness of this need which I take into account when designing the signature of new methods.

I think breaking down the changes in reviewable diffs will be very difficult.
Even without the namespace change, the changes to the core classes like Project, History, Row, Cell generate way too much noise on their own to make reviewing those diffs palatable, I would say.

What I think makes more sense is to have really clear (but concise) descriptions of the change at a high level, and review those instead.

The design should definitely be reviewed, in addition to the code, but I don't think a description of an (intended) change is a suitable proxy for a diff of the actual change in terms of review.

I took a look at the new error handling to see how easy it would be to review a set of changes [...]

I found it to be very hard to review as it's split across many commits and sometimes with unrelated commits in between. I found that it's sometimes tricky to know if something broke because of one particular change or something unrelated on the 4.0 branch.

[...]

I think that these changes deserves the same level of scrutiny as other changes we merge.

I agree and I'm struggling to see how we make that happen. It seems like there are three possibilities:

  1. Break the changes into feature branches of a manageable size which can be reviewed (mostly) independently. This is how we've traditionally worked.
  2. Attempt to review the entirety of the changes en masse. This would probably require reviewers to dedicate several weeks basically full-time to the review to have any chance of a cohesive review and would likely still fail to provide the same quality review as #1. Maintaining focus for that period of time and avoiding burnout would be a real factor.
  3. Skip the code review entirely and merge hundreds of commits without any review.

I think #3 is a terrible idea and #2 is difficult to achieve with a part time volunteer development team, which leaves us with #1. Although it may be difficult, I don't really see any other option unless someone else has other ideas. Does anyone?

Although breaking things into reviewable chunks may be difficult, I'm sure there are more and less difficult versions of it, so the question then becomes what's the least difficult way to organize this work?

I don't mean to ignore or minimize Antonin's earlier comments about the stress of working alone without supervision or peer review. I also recognize that this work represents a large investment of time and money.

Tom

I fully share Tom's views here.

Hi both,

It's great that you are interested in reviewing this work!
I haven't had much time to work on this recently but it is something I intend to have a go at: try to find some meaningful points in the commit history to break down the changes into a few logical steps.
I'll report back once I have more concrete things to show for in this direction.

Hi @tfmorris and @abbe98,

I have made an attempt to split my work into reviewable chunks and opened PRs for that on my fork (so that I don't flood the PR list of the official repository):
https://github.com/wetneb/OpenRefine/pulls

The PRs currently open there amount to the first half of the commits on the 4.0 branch (which I have been reorganizing quite thoroughly), so there would need to be a bunch of other pull requests that would go on top of that if you like this approach. Note that to present the rest of the commits in a similar fashion, I will likely need to shuffle some of them into the current chunks (for instance for later fixes which address problems which were already present in earlier stages) so please don't invest too much time in reviewing the PRs as they stand yet - at this stage I am mostly looking for feedback about whether this approach works for you.

The PRs are opened in their logical order, and they each depend on the previous one, so I have introduced ad-hoc branches (4.0-point-0, 4.0-point-1, and so on) so that we can see the difference between each step in a clean way, without duplicating commits from the previous PR.

The main PR is the third one: Refactor the core data model by wetneb · Pull Request #336 · wetneb/OpenRefine · GitHub
I explain there why it seems difficult to split this one further, but if you can identify changes that you'd like to split out let me know.

1 Like