Desk research on operatrion history functions

Hi all,

I've collected my desk research on how various tools design their Operation History functions. I've been looking at the user journeys/interfaces of spreadsheet tools (Excel, Google Sheets, Talend, Tableau) as well as creative editing tools that are quite different: photo editing tools (ex. Photoshop) and advanced writing tools (ex. Scrivener). I'm thinking about language used, visualization, and user experience in particular.

Here's a public google document collecting my research for all to look through.

Please feel free to discuss below!


The "snapshot" or "save point" idea is similar in concept to a Git commit (with full index) where the state of a project is saved and optionally packaged, in Git terms. Rollback (databases, other tools) versus a Git Checkout of a prior state, is an interesting discussion of differences. As is how would OpenRefine work and look like if there was something like git revert. But what we actually have is like git reset whenever a change is performed in mid history. It wouldn't have to work that way if we had non-linear history support that was much more like Git and its branching history and commits (snapshots/save points).

@antonin_d Did you ever look into Apache Iceberg to see if it might help us with some of the above?

@zoecooper, thanks for sharing your progress. I think it is important to represent the dependencies between steps. A Gantt chart-type illustration with predecessor tasks could be a useful reference to visualize the workflows.

I shared here User Interviews Results Part 2: Exploring Feedback Regarding OpenRefine Feature and User Experience finding from my user interview regarding the Undo/Redo functionality.

Here's a public google document collecting my research for all to look through.

Thanks for summarizing your research! Inline comments don't seem to be enabled on that document, so I'll comment out of band here instead.

  • I think the real distinguishing characteristic of the format that you call "Horizontal Pipeline" is not the direction (horizontal vs vertical), but the fact that it supports multiple inputs & outputs and displays it all in a graphical fashion. Iteration (ie loops) is another step beyond that, but I don't know if any of the products that you reviewed support loops.
  • Google Sheets also has a complete version history which is exposed by clicking the clock clock face at the upper right corner. All changes for a given day are collected/summarized, but can also be examined individually. Each user gets a separate color for their changes. Each timepoint can be labelled with a name and you can create a copy of the document at that snapshot or rollback to the snapshot. Copies don't preserve the version history (ie a fresh document is created from the snapshot).Screenshot 2024-02-02 at 1.31.57 PM.png
  • Neither Google Sheets nor Numbers warn when you're about to wipe out your redo history by doing something other than Redo after the Undo. By sticking with a simpler, less powerful model, they can get away with less handholding of the user.
  • The Talend description seems incomplete. I'm guessing this is just a snapshot of work in progress?
  • There are things hinted at in the Tableau screen captures (e.g. Script, Insert Workflow) that hint at more advanced capabilities which might need to be investigated.
  • The Photoshop help mentions some interesting capabilities like a paint brush which will erase back to a certain point in the history and describes the "non-linear history" in a fashion which is different from the mental model I had.
  • There is discussion of various capabilities (statistics, time tracking, proof of work, human readable (but not executable) provenance, etc) woven throughout the individual product reviews. It might be useful to pull those out and summarize them as a way of framing the discussion about what is in bounds / out of bounds for the OpenRefine feature(s) you want to design. I understand that perhaps that is intended to come later after all the individual analyses are complete.

I hope that's useful feedback. As I mentioned in my note about workflow languages, the range of functionality from version history to undo/redo history to full blown data pipeline / workflow language is quite broad, so the earlier you can define the scope of the work, the more you'll be able to focus.


@Martin yes, I agree the dependencies between steps is key, that's been the main focus of my sketching since the desk research.

And thank you for sharing the link to your research, it's very helpful!

@tfmorris thank you for your thoughtful feedback and for outlining those points, they are very useful!