Operation registration improvements

I want to share some thoughts I have been mulling over for a while, about how we could improve the way the core tool and extensions register operations. I want to make a case for improving the operation registry to support a few use cases.

When registering an operation, we currently only register the Java class for the operation in the backend. Unlike the importers, where we register in the same go the Java class that implements it and the HTML/JS files that implement the UI to configure the importing settings, operations only come with a backend-side component. If we could register their UI in the same go, I think it would enable a range of interesting use cases.

Which use cases do I have in mind?

  • The openrefine-command-palette, an extension by @abbe98 which lets users execute operations without going through the column menu, but by prompting them from the keyboard instead. It's a really nice UX, I encourage people to try it out. To achieve this, the extension needs to explicitly list each supported operation and associate it with:

    • A human-readable name and description for the operation
    • How to create the UI to configure it (for instance the new ReconDialog(column) code for the reconciliation operation)
    • Which bits of information need to be prompted from the user before showing them the configuration UI (typically, the name of the column to run the operation on)

    You can see how it looks like in the project-context.js file. The problem is: @abbe98 needed to explicitly list all operations there together with the appropriate bits of UI, because that information is not currently available from the "official" operations registry. This has quite some downsides: if we implement a new operation, it will not directly be available in the command palette unless someone updates the extension. Also, if we change the UI for configuring an operation (like we have recently done for reconciliation), this can break the command palette in the same go.

  • Other ways to provide alternative to our clunky column menu: for instance, by offering all available operations in a tool bar (independently of the columns), each represented by an icon accompanied with a name / description. The column to apply the operations on is prompted to the user when those buttons are clicked. If we wanted to implement something like that, we'd have to re-list all operations in a similar way.
  • In the reproducibility project, one of the improvements we are considering is to make it possible to change the settings of a previous operation. Imagine a little "pencil" button in each entry of the history list that would let you reconfigure a previous step: for instance, change the settings used for reconciliation, tweak the GREL expression used to transform a column, or add a missing header to an URL fetching operation. Making such changes would trigger the recomputation (or discarding) of the later operations that would be impacted by the change. To implement something like this, we need a uniform way to generate the configuration UI from an operation.

So, how could we address those use cases? When registering an operation, we could also provide:

  • a short name for the operation (as an i18n key, so that it's localizable?)
  • a short description of the operation (similarly localizable)
  • an icon for the operation, to be used in menu items or history list
  • a javascript function which shows the UI for this operation. As parameters, this could have an existing operation configuration to pre-initialize the dialog with, a callback to call with the final configuration once the user has validated the dialog, and perhaps other things.

@zoecooper is looking into how hard it would be to come up with an icon set for our current operations, looking at existing icon sets or creating our own.

One problem we have is how to integrate into this picture the column(s) an operation is applied to. Given that currently, operations are generally triggered from the column menu, the dialogs that configure them typically don't prompt the user for that column name, since it is already implicit from the column menu used to trigger the operation. For instance, in the command palette, @abbe98 implemented his own pre-prompt for the column name inside the palette, to then be able to re-use the existing UI. A similar solution could be used in the "official" operations registry, but I have the feeling that this would be inheriting a design choice which is not necessarily pointing us to the right architecture in a context where operations are triggered from other contexts.

For instance, consider the "columnize by key/value columns" operation. The configuration dialog only lets the user pick three columns, as configuration for the operation. The column menu used to trigger the operation determines the pre-selection of the "key" column, but that's sort of arbitrary. This is typically an operation that does not belong to a column menu, really.

One could imagine that the dialogs for other operations could also let the user select which column the operation is run on, similarly to this dialog, and so that they could be exposed outside of any column menu.

What do you all think?

So... I prefer to have scroll pickers for Column(s), rather than a dropdown suggest widget where you have to begin typing a Column name. I feel that a scroll list ( even if constrained to only 5-6 Column name rows if the dialog area needs to be small ) would still be a better UI/UX experience than a single dropdown suggest widget where the user must begin typing, like in a Search box input. If the dialog area in whatever context allows more than 5-6 column names to be displayed, than great!

Initial mouse movements (versus) initial key typing, I guess is my preference for any Column picker widget/component?

To be more clear, I am not saying we need to always display 5-6 Column names initially in any dialog set... I'm saying the elements can still initially be displayed as a single column element, i.e. a default "Pick Column..." label and where the Column name scrollable list then slides down or appears when the user clicks on the "Pick Column" element. Where context dictates, we can expose multiple "Pick Column" elements or pop them into an expanding view dialog where necessary.

The command palette experiment was so well received that it was properly implemented. To most users it even replaces the majority of dialogs as an "action" can take a series of parameters dependent upon each other(the idea is hinted by the extension prototype storing "param"-types in an array).

For example the following configures a reconciliation action, it takes a column, a reconciliation service, and an optional datatype(dependent on the selected reconciliation service) to directly trigger reconciliation without any other dialog;

{
    reconcile: {
        name: "Reconcile",
        description: "",
        params: [
            {
                type: "column"
            },
            {
                type: "reconciliation-service"
            },
            {
                type: "reconciliation-datatype",
                description: "'skip' will result in reconciliation against any datatype.",
                skip: true
            }
        ],
        func: (column, rs, externalDatatype) => Reconcile(rs, externalDatatype, column),
}

Or a basic GREL transformation:

{
    grelTransform: {
        name: "GREL Transformation",
        description: "",
        params: [
            {
                type: "column"
            },
            {
                type: "grel"
            }
        ],
        func: (column, snippet) => Transform(column, snippet),
}

The problem is: @abbe98 needed to explicitly list all operations there together with the appropriate bits of UI, because that information is not currently available from the "official" operations registry.

I first tried using the menu configuration to register "actions", however, this broke down both because of complexity and because of how much each "action" came to depend on parameters. The "solution" was instead to break the "actions"-logic from the menu-logic(like how it is upstream today for reconciliation) and register the "actions" twice, once for the menus and ones for the command palette. I considered a "central action registry" but skipped it as a proper implementation would take much more time than managing the two registration points.

For instance, in the command palette, @abbe98 implemented his own pre-prompt for the column name inside the palette, to then be able to re-use the existing UI. A similar solution could be used in the "official" operations registry, but I have the feeling that this would be inheriting a design choice which is not necessarily pointing us to the right architecture in a context where operations are triggered from other contexts.

I can only agree with this if you are limiting the proposal to operations, for context an "action" in the extension can take either "column", "project", or no parameter. The final implementation, however, in addition to supporting multiply parameters dependent upon each other it also supports two dozen parameter types.


One thing this UI is rather terrible at is feature discovery, for pretty much all features one will still need a menu items or something equivalent so that users can discover more than the most popular actions. Given that the upstream/existing menu is not good at feature discovery either it's still a problem that needs attention.

Not sure if the extension behaves like intended, but it should not require the user to start typing before they can select a column as long as there are less than x-columns. I'm not sure the x-items limit is actually necessary for column selection or if one can default to a overflow-scroll-list. It do affect performance if one tries to dynamically render 100s of items, in our case I have never seen a project with so many columns but our GREL library and reconciliation services would cause issues.

I support improving the operation/function registries. Hopefully we can make this one last breaking change for extension writers and keep things compatible afterwards. That probably means thinking through, as best we can, what other potentially breaking changes are on the horizon and bundling them together.

Something that I think would offer opportunities to improve the framework for functions, controls, and operations, is to declare parameters, along with their names, data types, and descriptions. This would allow for more centralized error checking with less duplication and might also eliminate the need to include Javascript UI functions, which feels like a bit of an outlier. This data could also be used to generate help text, power autocomplete, etc.

Note that Operations already have a description, but it's only used for generating history entries. The way it's done currently is a little backwards because they are localized in the language of the creator, which might not be the same as the language of a later viewer.

One question in my mind is how much of this stuff should be registered externally vs something that you can ask the objects about. Keeping things like the parameter datatypes together with the parameter handling code seems like it might be a little tidier and less error prone.

Tom

Something that I think would offer opportunities to improve the framework for functions, controls, and operations, is to declare parameters, along with their names, data types, and descriptions. This would allow for more centralized error checking with less duplication and might also eliminate the need to include Javascript UI functions, which feels like a bit of an outlier. This data could also be used to generate help text, power autocomplete, etc.

I hear you, it would be nice to have a declarative description of all operation parameters. However, I have doubts this can really be workable. As @abbe98 mentioned, we'd need support for a lot of different types of input fields, and would need to make it possible to introduce dependencies between fields.

Consider for instance the reconciliation dialog, or even the "mass edit" operation (which does not have its own dialog so far, but if it had one, it would likely be quite similar to the clustering dialog). Those dialogs have a really rich structure so I think it's unlikely we can replace them by something generated from a declarative description of operation parameters and keep an acceptable UX. I think it makes sense for @abbe98 to go down such a route for the command palette, but I don't see such a system completely replace operation configuration dialogs for all our users, especially newcomers.

One area of the tool where we do this (generating a UI from a declarative list of fields) is the "Add column based on reconciled values" dialog, where the user can configure settings for each property fetched. The available settings are declared by the reconciliation service in its manifest. I introduced this system and I don't find it convincing: it is pretty inflexible (for instance, it is not possible to make the available configuration settings depend on the property being configured) and does not add much value overall.

To me, the example of the importers is a better option. The registration of an importer looks like this:

IM.registerFormat(
   "text/line-based/fixed-width", // identifier for the format
   "core-import-formats/text/line-based/fixed-width", // localizable name
   "FixedWidthParserUI",  // frontend component
   new Packages.com.google.refine.importers.FixedWidthImporter() // backend component
); 

In one go, that registers both the backend and the frontend component, with a common identifier and a localizable name. That lets the importer declare exactly how all of its options should be presented to the user, including letting the user interact with the parsing preview (in the case of the fixed width importer: adding draggable lines to let the user define where to split the data. See also the JSON/XML importers…). Clean but flexible, I would say.

I think this can be introduced in a non-breaking way: we can still make it possible to register operations without providing a UI for them at registration time, and let people upgrade if/when they want.

Also, I'm all in favour for thinking hard about new architectures before rolling out breaking changes on our extension points, but I don't think we can aim for it to be "this one last breaking change": we'll always identify new places where we need to make changes later on…

About bundling breaking changes together, I also got the feedback that it can be overwhelming for extension maintainers to upgrade to a new version with a lot of changes. So it's about finding the right balance.

Personally, my hope is that by encouraging good testing practices, we'd make it easier for extension developers to detect breakages and make the necessary changes. A lot of extensions just don't use unit or integration tests at all. I'd really like to try making an externally developed extension with Cypress integration tests to document the practice. See also the corresponding Outreachy/GSoC proposal.