Presentation
This post focuses on the results of the feature voting done along with the survey using the all-our-ideas platform. The vote took place between August 1, 2024, and October 7th, 2024 and out of the 226 survey particpant, 109 of them participated to the feature ranking survey (only participant to our bi-yearly survey had access to the link). A total of 9,077 votes were cast, for over 16 hours of culumated voting time overall. On average, each participant spent 9 minutes voting and answered 83 questions each. However, this average was influenced by a few highly active users, with most participants voting on fewer than 50 questions, while some answered over 300, and one participant reached as many as 1,280.
To enhance readability, I grouped similar suggestions together and adjusted their votes. I manually approved each user's suggestion to prevent spam and avoid duplicate entries. The final score includes calculating a weighted average for similar features or boosting the results for duplicates that weren't presented to participants.
It’s important to note that features with fewer total votes (especially those introduced late in the survey) had less exposure to participants than other suggestions, and it may have influenced their final score (positively or negatively). I left a note for the six suggestions with less than 208 votes to ensure fair interpretation (the 208 vote threshold was determined by taking the bottom 10% of the vote).
Interpreting the Scores:
The score assigned to each feature reflects its relative preference based on head-to-head comparisons made by participants during the survey. Each time a feature was selected over another, it earned a "win." The score is calculated based on the ratio of wins to the total number of vote. Here’s how to interpret the scores:
- High Scores (Above 65): Calculated as the mean + 1 standard deviation, these indicate strong community preference. Features with high scores won more frequently, reflecting importance or desirability.
- Middle Range Scores (35-65): These scores indicate mixed responses; while often ranked favorably, these features also lost some comparisons, suggesting they are useful but not critical to all users.
- Low Scores (Below 35): Calculated as mean - 1 standard deviation, low scores indicate less frequent wins in comparisons. These features may represent more niche requests or less urgent needs.
Feature Ranking
The columns labeled Source indicate the origin of each feature suggestion:
- User: A feature suggested by a participant, subsequently included in the vote.
- Seed: A feature pre-selected as part of the initial list of options in the survey.
If you believe that any of the issue relationships listed here need to be updated (added or removed), please let me know. We want to ensure that all feature requests are accurately linked to the relevant discussions and issues on GitHub.
Score | Source | Feature Request | Comment |
---|---|---|---|
76 | User | Remove duplicate rows: rows that already exist are deleted; optional feature: only select certain columns to be evaluated for comparison | Related issue: #3218. This suggestion was introduced later in the survey and, therefore, received fewer votes compared to others. |
76 | Seed | Native Reconciliation with arbitrary external datasets, e.g. csv reconciliation, reconciliation with another OpenRefine project. Reconciliation against any dataset, not only the ones with a reconciliation endpoint. Reconciliation locally is a very useful feature. I suggest local reconciliation based on projects in Openrefine instead of using external services. (from the survey survey open-ended questions) | The current solution is to use one of the csv-based reconciliation services. Related issue: #2003 |
73 | User | Allowing import again for adding new rows to existing projects. Easy adding of additional rows/importing rows from other datasets. Update existing project with new data (append rows from a file). Extend datasets: add rows to existing datasets from files, assuming the same column structure/order. The inability to update imported data is a limiting factor (user survey) | Related issue: #715 |
72 | User | Making joining records easier across 2 datasets by multiple keys! | Need to create an issue |
72 | User | Easier rename a column by clicking on it and don't break any facets that depend on it | Loosely related issue: #6282 |
68 | Seed | Loading and working very large projects more easily/smoothly (100,000s of rows/records). Loading and working large datasets could increase OR's user base in both numbers and variety to a great extent in my opinion. | Related to the scaling - 4.x branch |
67 | User | Allow canceling at any time those long spinning operations like Clustering | Clustering will have many improvement in the upcoming 3.9 version. Need to create an issue to cancel operation that take a long time to compute. |
67 | Seed | Better support of nested structure (improved record mode) | See discussion in Representing hierarchical data: beyond the records mode? |
67 | User | More GREL options that allow for creating and using variables for your dataset | Need to create an issue |
65 | Seed | Option to save facet | Related issue: #560 |
62 | Seed | Support Python 3 as an expression language - Continue maintaining Python / Jython option. | Also requested survey open-ended questions. Related issue: #2249 |
61 | Seed | Drag and drop for columns | Issue to be created |
60 | Seed | Native Reconciliation against a SPARQL query | This is supported via RDF extensions but may be not known by all users. How can we better advertise it? |
59 | Seed | Support for more diverse (human language) alphabets/scripts, date and time formats... | No related issue. This is question comes from of Diversity grant |
58 | User | Allow bookmarking and naming starred GREL expressions so they can show in a Star top-level menu. (seed) / Allow Users to customize a Custom Menu to save macro (user suggestion) / More 'point and click' functions to replace GREL (seed) | Discussions and the consensus is to promote option for the users to create macros. I merged the two seeded questions (individual scores of 60 and 54) and user (score of 64) suggestions for an average score of 59. This is loosely related to the suggestion of allowing the sharing and exploring of public expressions. Related issue: #109 |
58 | User | Quick delete all rows having empty cells | Related issue: #1472 |
58 | Seed | In context help for GREL or wizard-like approach to writing GREL (like Excel) | Mention in User Interviews Results Part 2: Exploring Feedback Regarding OpenRefine Feature and User Experience. This issue would require additional design work to better scope and refine the feature. |
58 | Seed | More and better notifications, error messages, and warnings in OpenRefine | This is track via many issues (and most likely many more to create) under the error handling tag in Github |
58 | Seed | Multi-user support: allowing two or more people to work on the same project | Related issue: #101 and discussion at the 2024 Barcamp OpenRefine 2024 Barcamp: OpenRefine as a Service |
53 | Seed | Some simple data visualization features | Related issue: #5315 |
53 | User | Don't go back to the beginning after matching during reconciliation | Related issue: #33 and #6546. The improvement is part of the upcoming release of version 3.9. This suggestion was introduced later in the survey and therefore received fewer votes compared to others. |
53 | Seed | Option to refactor the JSON operation scripts to edit a facet, update a GREL command, or add a step | Related discussion Which reproducibility should we focus on? - #5 by Martin |
51 | Seed | Improved integration with cloud storage services for data import and export. | Need to create an issue and better scope this feature. |
50 | User | Supported REST API for external use | This is already supported via add a column by fetching url. During the 2024 BarCamp we discussed supporting OpenAPI within OpenRefine OpenRefine 2024 Barcamp: Support OpenAPI in OpenRefine |
49 | Seed | Improved JSON parsing when calling API | Related issues: #1440 #2515 |
49 | User | Faster rendering of many columns in Record mode | |
48 | User | Integrate a call to HuggingFace AI models to automate tasks (see HuggingSheet) | @Michael_Markert show us here how to integrate OpenRefine with a LLM: Using local ChatGPT-like LLMs in OpenRefine for data wrangling. This suggestion was introduced later in the survey and, therefore, received fewer votes compared to others. |
47 | Seed | Make OpenRefine easier to learn and get started with better or easier UX / interface | This would need dedicated design effort |
46 | Seed | Pause and resume my operations in OpenRefine | Related discussion: Partial results of long-running operations |
46 | Seed | Save Template exports | Related issues: #1928 #468 |
45 | Seed | Allow users to set precise values for numeric facets | Related issues: #5168 #5008 |
44 | Seed | Less abandoned OpenRefine extensions: only present maintained and currently operational ones | A participant indicated in the open-ended question of the survey that many plugins and services are VERY DATED and look abandoned. This needs a referesh. See also conversation in Improving the UX of extension install, and Butterfly |
44 | Seed | Better support of MARC format for complex dimensions and repeating elements. | Related issues: #794 #2127 |
44 | Seed | A walkthrough tutorial inside the software itself, to introduce and guide new users | This would need dedicated design effort |
43 | User | Ability to extend data by bringing in qualifiers from Wikidata | This suggestion was introduced later in the survey and therefore received fewer votes compared to others. |
41 | User | Supported client/client library based on REST API | |
40 | User | AI integrated help for writing regular expressions, GREL etc | May be related to the suggestion In context help for GREL or wizard-like approach to writing GREL (like Excel) |
38 | User | Default Wikimedia support as a core OpenRefine feature | |
37 | User | Add transform for book-style Title casing | Need to create an issue |
37 | User | Allow sharing and exploring of public expressions | This is discussed in Which reproducibility should we focus on? . This is loosely related to #109 |
35 | Seed | Faster upload to Wikibase, Wikidata, or other Wikimedia projects – Fully maintained production Wikidata reconciliation service with better reconciliation and performance | See the summary of the discussion during the 2024 Barcamp OpenRefine 2024 Barcamp:: Reconciliation in OpenRefine |
34 | Seed | An online, hosted instance of OpenRefine | This is often requested by the trainer as a replacement for the unstable mybinder deployment. See also the summary of the discussion during the 2024 Barcamp OpenRefine 2024 Barcamp: OpenRefine as a Service |
34 | Seed | A keyboard-accessible GUI | This is supported via the keyboard acceleration extension prototype extension. See discussion Keyboard acceleration extension prototype and repo |
32 | User | Work on reconciliation of Wikidata Lexemes | Related issue: #2240 and forum discussion OpenRefine support for Lexemes in Wikidata: how would you use this?. This suggestion was introduced later in the survey and therefore received fewer votes compared to others. |
32 | Seed | Delete multiple projects at once | Related issue: #4965 |
29 | User | Parquet import/export | Related issue: #1929 |
26 | User | Easy start/stop of OpenRefine on Windows | Related issue: #3221 |
24 | User | Better support for SELF HOSTED wikidata instances. (setting up manifests, and creating data previews (when reconciling) is full of dark secrets. Wikibase cloud reconciliation - Improved integration with Wikibase would be important to me because right now I have to make do with some workarounds that can be time-consuming | I suppose this suggestion is related to wikibase instance and not OpenRefine itself. See this thread regarding the effort to make this process easier. Fundraising to commission the development of a MediaWiki extension for reconciliation with Wikibase. This suggestion was introduced later in the survey and therefore received fewer votes compared to others. |
20 | User | Support HDF5 importer and selecting a file within it | Related issue: #640 |
20 | Seed | Support R as an expression language | Related issue: #1226 |
Additional Feature Requests
These are suggestions gathered from the open-ended questions in the user survey. They were not part of the options available for voting but provide valuable insights into user needs and potential improvements.
- Dark mode would be greatly appreciated: Related issue: #3017
- An official docker hub image would be nice. This is available in GitHub - OpenRefine/containers: Collection of containerized packages of OpenRefine see discussion Proposal for a new repository: containerizations for OpenRefine
- any features about geographic coordinates will be very useful. Related issue: #6570 and forum discussion OpenRefine 2024 Barcamp: Making OpenRefine more useful as exploratory tool
- Connect with Zotero for reconciliation and publication. Issue to be created.
- ODS spreadsheets fail to upload. Related issues: #6877, #3055, #2243
- Adjust columns. Related issue: #4806
- Increase the size of the preview window when we are working on the column. I work with really long values, and sometimes, I can't even see one full value in the preview.
- I really wish there was a setting config in the GUI that would highlight what could be modified and what are the current configuration files even if it cant be modified from the menu. This would expose what could be customized by the user in the config and provide a guide on how to extend or what settings could be modified. In particular, this could be used to highlight new defaults and new reconciliation services and extensions both of which are only really visible if you dive deep into the help or are working in one of those areas. The ability to discover that they exist at all, if they changed, or their current status (ie broken, working, slowed) etc would benefit from a settings menu in the OpenRefine GUI for cross-discovery and lowering the entry point to OpenRefine's more useful integrations.