Working group for reconciliation user interfaces

Hi :wave: At the BarCamp in Berlin we agreed to set up WG for reconciliation user interfaces. We did not yet agree on the practices, and I would like to invite the initial team @lozanaross @mack-v and @thadguidry to agree on how the work will be organized.

I have myself created two screens (as braindumps). One is for reconciliation settings and the second one is for a multimodal dashboard that combines a selection of views (property comparison, map interface, timeline, website preview, recon settings) to make judgements about the items to reconcile. I can imagine the tool being tapped into many different environments, taking tabular data as input. So it could be possible to develop it without too many strings attached, and serving other environments as well.

I am hesitant to share the screens publicly at this point before we agree on our working environments and other constraints.

Let's continue discussion!

Cheers, Susanna

3 Likes

OK. I will start spamming with sketches to bring the ideas forward. I think a reconciliation interface could be an app that could be used with many different underlying programs (OpenRefine, Google Sheets, local Excel and others, but what exact requirements for the data input/output there would be, I don't know. I have a hunch it's doable).

Making educated manual decisions about reconciliation requires comparing the local and remote data in many ways. I am designing for the Wiki* environment here, assuming the use of the reconciliation API, but possibly extended with some other features if possible.

Here are design proposals for full-screen interfaces. I have not taken any OR design frameworks into account, these designs are there to convey my ideas for functionalities.

Settings
The settings would consist of several independent setting sections that could be switched on or omitted.

Each section would host a number of options (these have not been precisely defined. There are constraints with the recon service, and a wide variety of possible variables). There are many fixes to current problems in the way the data is added (eg. use an exact match with an authority ID and skip text recon; allow using constants as values, control how strictly different data is required and tune this for individual items or refine the settings during the manual process, and maybe use AI to learn from the tweaks)

The top part of the reconciliation window would display the data of a single cell similarly to the Search for match interface. You could edit the input, create one new item or an item for all similar values, or not reconcile.

The bottom part would list the candidates on the left and the recon interfaces on the right. You could use any of them to make the match. You could add, maximize, minimize or close interfaces and change which data they use on the fly. The settings would be one of the possible components.

Interfaces that I have sketched are

Comparing web pages

Comparing coordinates

Comparison of individual properties for all candidates

Display of all available properties for a candidate

These designs are in Figma and can be made collaborative, if there is interest.

3 Likes

This is fantastic, thanks a lot for this amazing work!

I really like the idea of letting the user configure multiple panels to display the information they need while matching. The ability to show other external websites than the database being reconciled to would be really great. For instance, when matching to Wikidata I can definitely relate to the need to see not just Wikidata, but indeed any Wikipedia article associated to it (assuming it's a language I know). Beyond that, you could also want to show external websites associated to the Wikidata item (via an external id property). It's intuitively something that would not just make sense for the Wikimedia environment, but when matching to all sorts of databases: for instance, VIAF is an obvious one (since it doesn't hold so much metadata on its own, but has a lot of references to other databases).

Concerning the configuration of the reconciliation process, the proposal you are making is pretty well aligned with the direction that has been taken in the recon dialog redesign:

The idea of having reconciliation for multiple programs beyond OpenRefine (such as Google Sheets or Excel) is also very aligned with our efforts to document the API as something that can be used independently of OpenRefine.

1 Like

Thank you for the positive comments! I hope we get conversations going, and maybe move forward with some practical actions. :steam_locomotive:

One thing is to make a reality check for the features, and if they cannot be realized, maybe they can inform API development. Recon settings and the complexity of that data is one tricky part, and the data exchange with the underlying software is another.

Secondly I hope we more eyes on the UI and functionalities.

And finally, get a project together and start working on it!

First off, WOW @Susanna_Anas ! Great creative work!

One concern that we had back in the day was interactively Filtering Large Lists of Candidates, because sometimes a large result set of candidates is returned or requested by a client, or simply because the service offers deeply enriched data. Imagine that its chemical compound sequences or drug formularies being matched, and not so much commonly named entities with language labels. So, there would need to be filter controls for the candidate results themselves (a typical drug formulary might match against 30+ candidates, depending on initial broad properties selected). The filter controls need to be easily accessed to reduce 30+ in the list, down to a handful, via properties quickly toggled on/off. For instance, in one panel, a set of accordion dropdowns to sub-select properties that are contained under property parent groups (In the Recon API, a Property's Type/Category, which we don't have currently) in each collapsable tier of the accordion slideout.

@antonin_d We'd likely need to account for adding a type/category field to a property, instead of only the current id, name and maybe other things I'm not thinking of, in order for better interactive candidate filtering to work well.

Thank you for the response! I would like to see different kinds of filtering options that could be made more permissive or strict depending on circumstances and the quality of data while working on the dataset. It could inform the batch process or only the reconciliation process of a single entity. The need for data for the initial candidate retrieval differs from the extensive need for additional data for manual checking of candidates.

I am also keen to work on this in the context of different environments and applications, so that there would be minimal need for integration to the underlying software. Also, it would be interesting to experiment with combining different search methods to retrieve results (Recon API, SPARQL, using Wikimedia-specific data such as categories or links...).

I had a chat with ChatGPT and I have a rudimentary proto-prototype. I hope that it turns out good enough to be shared.

1 Like

I have worked on an actual working prototype. It features more ideas while only a couple displays are actually functioning.

https://avoinglam.github.io/ReconUI/index.html

Quick instructions:

Type in the input box a term that you are reconciling. Candidates will be shown in the left pane and further information will be shown on the two panes on the right. This is a stop-gap for testing, eventually the idea is to navigate the underlying dataset.

Two interfaces are functioning

  1. Display a Wikimedia site – Wikidocumentaries (it is not a Wikimedia site but displays information based on information on Wikimedia projects).
  2. Map of the candidates with coordinates (with the exception that extraterrestrial coordinates are making the result set not displayable).

Further interfaces proposed in the concept

  • Display of web pages based on data either in the dataset or the wikidata property of the candidate, and either a URL constructed from base URL + ID or a full URL
  • More Wikimedia project displays (wikipedias in different languages, display of several properties etc)

The search method used is Wikidata SPARQL with mwapi text search returning QIDs. Further properties could be used to limit the results and tweaked when evaluating individual items.

Further search methods could include

  • Reconciliation API
  • Web search limited to Wikipedia articles, returning QIDs

Data sources could include

  • Google Sheets, I have tested this but it is not working in the proto
  • OpenRefine via API
  • Data import (csv, tabular data)
1 Like

Love this work @Susanna_Anas !
There's also somewhat alternatives, well, maybe not real alternatives in the true sense to your work, but which would be through browser extensions.

It's somewhat like what we had loosely thought of for OpenRefine in its very early days (a side panel for recon). I use it in this way: once I have reconciled and being offered suggestions in OpenRefine, I can just hover (or click, dependong on the extension techniques offered) on each suggestion's blue link and the extensions will automatically preview it on the side panel.

There are other link preview extensions, but I think the best one personally on Edge has to be the built-in one from Microsoft themselves as part of the new split screen functionality - which now has a "link tabs" option that can be enabled. So I keep OpenRefine on the left panel and then click on links inside suggestions and they appear on the right panel.

  1. enable split screen mode in Edge then activate its button on the toolbar

  2. click the "More options" ellipsis (...) in the top center of the split screen

  3. enable "Link tabs"

Regardless of browser extensions, they are always for general browsing cases. Whereas your work is especially tuned towards reconcile with OpenRefine, which is the ideal and can be further customized to fit narrow uses cases and workflows. Kudos.

Thanks for the comments, and sorry for the lag. I had not checked back earlier.

I have not worked forward on this in the meantime. I think for immediate purposes, it may be useful for triggering ideas about screen estate. In the longer run, it is only sensible to develop it further with more specific contexts in mind.

I have dedicated some time and attention for this kind of work in my plans for next year, and continue to be interested in developing recon interfaces for OR and other data tools and repos.