User-defined Clustering Project

I like the general design philosophy here.

We should provide more contrast between buttons and their background/surrounding background.
Deeper gradients?
Thicker black/grey borders?

What does it look like with 10+ custom keying/distance functions?

  • Scrolls well within inner box and keeps the header sticky?
1 Like

I also like it! I think there will be small style tweaks to do, to align with the design of similar components elsewhere in the UI, but that's something we can do in the corresponding PR directly.

Perhaps one slight change I could already suggest is moving the "Add new keying function" button so that it's more clearly part of the table (inside the rectangular frame rather than in the dialog footer), but that's just an intuitive preference. It's true that the dialog to manage reconciliation services also has the "Add" button in the dialog footer after all… So I am not completely sure!

1 Like

I am glad you liked the design. Thanks for your feedback and suggestions. @thadguidry, @antonin_d

I honestly prefer having the add button in the dialog footer. What do others think?

Thanks! Once I finish changing the style of the button, I'll send it here to get your opinion.

I agree, the scrolling idea makes sense to me; I'll definitely implement it. Regarding keeping the header sticky, I'm also unsure how it would benefit the user. What are your thoughts on that?

@zyadtaha The sticky header I was referring to was the Name | Action so users continue to see it while scrolling the list of functions.

Sorry for joining the conversation late, but regarding the design guideline, we have a the design system available in Figma. This should help guide decisions on colors, button sizes, and other visual aspects.

1 Like

@thadguidry I understand what you meant, but I don't think it's that important/beneficial for users to always see the header while scrolling. For example, the header isn't sticky in the expression preview dialog.

Excited to hear your opinions and feedback.

Proposal design for "view cluster" feature

  • Removed the History, Starred and Help tabs and added the "Preview clusters" tab.
  • Increased the table height by 50px for better display without too much scrolling.
  • Are the names "Preview expression" and "Preview clusters" suitable?

I like this design @zyadtaha but I don't quite understand why to remove the other tabs (History, Starred, Help). These options would still be very helpful (especially "Help"!) when writing a custom clustering function

1 Like

@ostephens We plan to include a small description or an example showing how to write the function, in addition to a link to the documentation (when made). However, your point makes sense. I removed it because I thought it would take up too much space and crowd the tabs.

I think this window can be quite large - we don't really need to see other information at the same time (the current expression editor could be larger than it is I think - I'm not sure what the purpose of restricting it is, as all the information you need is in the dialogue)

If we are worried, we could collapse the tab options into a dropdown select or similar, but I think we can just make the window use more width to be honest

Could you expand on this? Not sure I fully understand you.

How would that free space? I think it would be the same space.

You mean "Height" not "Width", right?

I thought your concern of keeping the other tabs (History, Starred, Help) was that having these tabs would be crowded? But if you make it wider there is plenty of space

At the moment if I trigger the transformation editor in Chrome it is very small compared to my full screen:

But I don't see why we restrict the size so much. When I'm using this window I don't really need to see the project greyed out in the background (maybe some exceptions - its sometimes useful to see the column names and the project name - but everything else I need is in the transformation dialog window) so I don't see any reason why this screen shouldn't be substantially larger - both width and height

1 Like

Great! I get your points now. Thank you so much for this great feedback @ostephens

After trying out the current implementation in the PR, I got some more minor suggestions.

When opening the dialog for the first time, the list of custom functions is obviously empty. It would be nice to have some sort of placeholder saying "No custom functions yet" or something like that. You can look at how it's done in the project list for instance (run OpenRefine on a fresh workspace and go to the "Open project" tab).

I also wonder whether the grey border around the tabs is really needed, given that the tabs already come with their own border. I can see that the dialog to transform a column also has it, so I imagine it is inherited from there. What do people think - is it actually useful? Because it runs directly alongside the tabs border, without any margin and in a different color, I find it rather unelegant, so I would be tempted to remove it (both from the existing "tranform" / "add column based on this column" dialogs and in this new dialog). Perhaps one can argue that it has a purpose in the existing dialogs because it separates the tabs from other contents in the dialog, but even then I am not really convinced.

Also, I would be in favor of adopting a consistent terminology across the feature. The existing clustering dialog lets the user choose between "Key collision" and "Nearest neighbor" as selected "Method".
This reads quite different from "keying" and "distance" used in the custom functions dialog. I wonder if we can improve that, although it's not so clear to me what a better proposal would be.
In any case, I think "keying" is a bit odd of a term for end users and will be difficult to translate. On Wiktionary none of the listed senses match what we are using it for here.

In our meeting on Monday @zyadtaha pointed out that the "keying function" and "distance function" terms are indeed already used in the UI, just to the right of the "Method" selector I was mentioning, so I think it's fine to keep it as it is.

A point to discuss is the preview for the distance expression, which isn’t very helpful right now because the result is based on 2 values, while the current preview shows only one value per row.

Antonin suggested that we handle the preview of distance clustering expressions by providing two input fields where users can manually set the values for value1 and value2. These fields could be prefilled with the first two values found in the column. This approach would allow users to see the result of the evaluation on these values and making it clear that value1 and value2 are available in the evaluation context. I felt that, in addition to this, we should allow users to see more examples to better understand how the expression behaves with different values.

So I have made a design proposal and would appreciate your feedback on it.
Here it is:

Nice!
Does it implode :eight_pointed_black_star: or explode :firecracker: if the values are REALLY long strings?

1 Like

Design looks OK to me. I'd suggest that the second column (with the result) could be displayed in the more standard way with a column header (either the expression or just "Distance") rather than having that repeated for each row.

Possible alternative would be to have two columns for the values ("value1", "value2") to give a more compact tabular layout? With appropriate text wrapping, and ability to change the size of the text box I think this would deal with situations like longer text. I think worth looking at this anyway.

1 Like

thanks @zyadtaha for the update.

How do we switch the preview from displaying the distance when working on key collision clustering to previewing the GREL results (current behavior) when working on a nearest-neighbor approach (as shown at 16 sec of your recording here)?

I agree

Can we reuse the current preview design (see quick sketch below) by

  • removing the row number
  • adding the distance column at the end

1 Like

The main question there is: how do you select the pairs of value1 and value2 to run the preview on? Remember that all the values are coming from the same column. The "interesting" pairs of values to use as examples are those for which the values are reasonably close… but how should the system select those?

1 Like