This post explores suggestions I collected through the user and contributor interviews I did in 2023. This is the second post of a series of three. The first message focused on governance and is available here Proposition for the Ambassador Council - #8 by Martin . I will publish a third message to cover feedback on how the community operates.
I tried to connect each suggestion to the relevant conversations and tickets. You can make changes to my message and add more relevant links if needed (I turned this message to a wiki post so anyone can edit it). To prevent conversations from being split into multiple threads, I recommend moving feature-specific discussions to their corresponding ticket or ongoing conversations.
Since my first message, I conducted five more interviews, which updated the breakdown of participants as follows:
Communities they belong to
- 8 from GLAM
- 5 from the semantic web,
- 2 from Humanities,
- 1 from OpenStreetMap,
- 1 researcher,
- 3 from other communities.
Type of contributions (one person can contribute in more than one way):
- 15 of them provide training,
- 12 provide support (except for one person, all those involved in support also offer training),
- 6 are developers,
- 5 are extension developers (with only one person overlapping with the developer group),
- 4 are involved in community management.
Two users envision better in-context help for GREL and regex, similar to an online integrated development environment (IDE). They suggest making GREL help more accessible, possibly through a wizard-like approach like Google Sheets or Excel auto-suggestions.
One user would like to see the option to save facets. I guess you can't use the PermaLink feature if you are a Facet lover… - #3 by Antoine2711
One user would like to visualize boolean values as checkboxes for easy edits. Flags and stars icons hover over text notification - #9 by ostephens
One user would like the option to duplicate existing rows or records. This could be a way to easily create new records or rows in the project. It will be similar to creating an empty row. Issue #556
One trainer reported that new users are often confused regarding the error messages when a facet has too many values. They don't know what it means and what to do next.
Four users mentioned that the current menu management is hard to navigate.
- Users who don't frequently use OpenRefine may forget how to perform certain tasks.
- Users understand data cleaning and converting names/entities to URIs but find it challenging to locate the right functions in OpenRefine.
- The nested menu structure in OpenRefine can be complex.
- Suggestions for improvements in the text transformation drop-down menu.
On that front, I wanted to point out the extension developed by @abbe98 that offers an interesting approach to revamping access to transformation features. https://codeberg.org/abbe98/openrefine-command-palette.
Another point that stems from those conversations may be the option to create custom menus (I am fully cognizant of the challenges that may bring when documenting OpenRefine). This will allow users to
- Reorganize functions in a way that suits their usage (for example, for some users, reconciliation is important while others never use it).
- Save GREL functions into the menu.
One user reported that the file storage system is somewhat opaque, making locating and migrating project files challenging.
One user is interested in a bulk delete feature for projects. Issue #4965
One trainer would like to see a better explanation of how clustering works so it is less "magic."
Three users expressed interest in improved basic visualizations in OpenRefine to facilitate data exploration and minimize constant back-and-forth with separate visualization tools. Another user went deeper into suggesting a data validation layer and enhancing data quality metrics, such as standard deviation, minimum, maximum, and the ability to set custom data quality constraints. He refers to Issue #5315. In my conversation around that feature request, it was clear from the users that they did not expect OpenRefine to produce graphics and advanced visualizations for publication. What they are looking for is a more visual way to explore their data.
One user suggests improving accessibility for screen readers and assistive software.
Four users see room for improvement in the Records mode, making it easier to create and export records. This is a powerful feature that sets OpenRefine apart from other software. See also the conversation Representing hierarchical data: beyond the records mode?.
Fill downworks differently between rows and records mode. Issue #3255
- It is challenging to maintain the hierarchy when importing JSON or XML.
- When importing a JSON file, there is no guidance on selecting the correct JSON level; this is a trial-and-error process.
A total of four users reported using OpenRefine for calling API. Two of them would like to see improvements when working with results from JSON web services. One user refers to it as the "JSON wall" when you end up with full pages of JSON results following an API call. He suggests simplifying the parsing of API results in OpenRefine, possibly by adopting approaches like XPath and XQuery specifications for JSON. Issue #1440 and Issue #2515
Three users expressed interest in editing the MARC format in OpenRefine. They are interested in extending OpenRefine's capabilities to support more complex dimensions, like repeating elements in JSON or MARC.
Nine users use OpenRefine to automate repeatable workflow. They are part of University and Research Institutions or GLAM and Cultural Heritage Sector. Several of them maintain recipes used by other team members to load files into knowledge graphs or other repositories. Often, the workflow includes a reconciliation step.
I discussed challenges when creating a JSON workflow that will be reused across many datasets with four users. They reported:
- It is difficult to remove specific steps in JSON and locate where certain steps are in the recipe. (1 user) Issue #2253
- When a step returns no records, it's not recorded or saved in the recipe, which poses challenges when creating a template recipe. (1 user)
- When you create workflows using JSON history with the step
Re-order/remove columns, if you import a new file with extra columns you want to maintain, they are deleted when running the workflow. The workaround is to move and delete each column individually, which is cumbersome. (1 user)
- With two users, we discussed the concept of catalog data round trips—bringing data into OpenRefine, improving it, and sending it back to the catalog and bringing OpenRefine's capabilities directly into other software is proposed, supporting formats like XSLT, Bibtxt, and MARC.
Eleven of the interviews used one or more reconciliation services.
One user would like to see improvement to support SPARQL-based reconciliation endpoint. This is currently offered only via extensions.
One developer would like to see support for other data types that string as reconciliation (date, integer)
Two users indicated the need for better documentation of error reconciliation. Insufficient error messages or feedback in various scenarios, such as reconciliation slowdowns or large file uploads. The user doesn’t know if the issue comes from OpenRefine, the reconciliation services, or the target service. This is something the team worked on via the NDFI 2023 grant; we also have a PR open #5944
I discussed Wikimedia Commons with only one user. We discussed the following :
- Input of raw Wikitext is not intuitive; some users prefer more structured text.
- Working on editing lexemes for Wikidata but requires more design on OpenRefine side - see also the conversation Wikitext in OpenRefine - #6 by olea
- Fixing issues with large file uploads in Commons (files over 100MB cannot be uploaded this is often the case with TIFF file). See Bug report: IO error while editing: Unexpected character ('<' (code 60)) - #9 by antonin_d
- A more in-depth review of WikiCommons usage is available here Results of two user surveys for Wikimedia Commons users of OpenRefine
One user expressed interest in better containerization for running OpenRefine locally and hosted. Proposal for a new repository: containerizations for OpenRefine
One user is interested in multi-core utilization of OpenRefine for parallel computations when deployed on a cluster of servers
Two trainers reported that when working in a hosted environment (for example with PAWS), their students often expect it to be a collaborative environment.