OpenRefine 2024 Barcamp: If only OpenRefine could be more like

Martin · July 9, 2024, 1:40pm

This session was a roundtable where participants presented other software from which OpenRefine should take inspiration. See the Barcamp page

Data Wrangler Extension (for Visual Studio Code)

Proposed by: @martin
URL: Data Wrangler Extension (for Visual Studio Code)

Visual data cleaning in the IDE for Python, bringing the "What you see is what you get" aspect of OpenRefine to developers. It comes with many reproducibility guarantees since it's based on Python.
@thadguidry: Can be overwhelming even for quite technical folks.

Dataiku - AI data prep

Proposed by: @martin
URL: Dataiku - AI data prep

In 2014-2015, various start-ups raised funds for ideas similar to OpenRefine (see also Trifacta below). It's interesting to see where they went.
Dataiku uses generative AI to replace GREL and other expression languages. Instead of writing an expression, you write in plain text what you want to do and generative AI takes care of the translation.
All steps are recorded in the history, similar to OpenRefine (likely a big inspiration).
@thadguidry: Makes working with data easy, matches well with the goals for OpenRefine.
@ostephens: How well does it deal with more complicated tasks?
@ostephens: Is the good thing the natural language parsing into expressions, or is there more clever processing behind it?
Many non-technical users struggle with generating even simple GREL expressions. I feel like the natural language parsing for instructions is the most important part here.

Trifacta and Alteryx have also been mentioned in relation to Dataiku as enterprise data preparation tools. Specifically, visualization of column data overlaps with Ydata profiling mentioned below. They offer immediate feedback to users about their data without further actions.
Trifacta is the same technology as Google Data Prep. It is more opinionated than OpenRefine, highlighting potential issues with the data (datatype outliers…).

QGIS

Proposed by: @jfaurel
URL: QGIS

Ways to easily manipulate and visualize spatial data and shapefiles.
Many users work with "shape files," a set of files that work together to produce a visual output.
QUESTION: Isn't "shape files" being slowly eschewed by the industry in favor of GeoJSON?
The transition will take a long while since many actors have loads of historical data as shapefiles.
How do we join multiple shapefiles together? This is an ongoing request in OpenRefine to add data to an existing project (See #715). The limitation with OpenRefine is that you have only one go to create your project. (Similar point for LibreOffice Calc below.)

Libreoffice Calc

Proposed by: @Ainali
URL: LibreOffice Calc

Spreadsheet software with CLI capabilities, for combining files for example.
Pre-preparation for OpenRefine. When you have many files with similar data (e.g., 21 counties or 290 municipalities), these cannot be merged in OR (outside of the initial project creation) but must be done in the pre-preparation step.

Microsoft Excel

Proposed by: @b2m
URL: Microsoft Excel

Office 365 now supports inline Python. This is bringing back people to Excel. However this feature requere Cloud so there is some privacy concernd. It is worth to note that MS PowerTools adds OpenRefine-like functionality to Excel.

Postman

Proposed by: @ostephens
URL: Postman

@thadguidry: I love the idea of making our Fetch have many of the features of Postman.
This would also help even more with JSON-LD and linked data in general.
Fetch URLs could be extended (or a separate extension).
Tool for interacting with web APIs.
Relates to OpenRefine add column based on URLs, parsing JSON, import data from a remote data source (fetching).
When you make a request, you can set variables.
You can import a CSV of data and use it when calling an API; Postman stores the CSV.
Useful for users using OR as a web scraping tool.
But a very technical tool; you need to get used to another scripting language.
Supports recipe sharing.

Open Data Editor

Proposed by: @Susanna_Anas
URL: Open Data Editor

Grant-supported.
Targets governments who want to publish and share data.
OKFN will be at Wikimania.

Zulip

Proposed by: @antonin_d
URL: Zulip

Great contributor documentation as an example.
PDF version of the doc is useful (@jfaurel) -(see ticket #275.
@thadguidry: I like the categorization; we can definitely take inspiration from those sub-categories.
@ostephens: Would like to see evidence that the documentation is actually helping.

Protege

Proposed by: @AtesComp
URL: Protege

Used in the context of the semantic web, open source project.
Standalone and web-based application, supports extensions. Built in Java - can give us good ideas on how to approach the extension plugin.

Ydata profiling

Proposed by: @Michael_Markert
URL: Ydata profiling

Quick reporting about a dataset qualities. Very interesting for data exploration.
Python lib used for Panda DataFrame.
Import a table and print a report with basic features of the data.
It is like generating facets for all fields in a project automatically and presenting the results in an HTML/PDF report, including visualizations fitting the data type of the facet.
@thadguidry: I've used more and more Polars.

MessyDesk

Proposed by: @Susanna_Anas
URL: MessyDesk

Visual workflow editor to visualize the pipeline.

JupyterNotebook

Proposed by: @AtesComp
URL: Jupyter Notebook

Example of literate programming.
Supports different kernels (Python, Julia, R, ...).
Combines text (Markdown) with code and its output.
Better collaboration environment might be Collab, but there is JupyterHub.

Text editor, IDE, and command line tools

Proposed by: @ostephens

Sometimes this is faster than OpenRefine for sort, de-dup, edit for smaller lists/data sets. It's really difficult to do the really simple things in OpenRefine like deduplicate a list

Datasette

Proposed by: @b2m
URL: Datasette

Focus on visualizing and exploring data.
Author comes from a data journalism background, and examples are often in this area.
Example: Global Power Plants Dataset.

Tabula

Proposed by: @ostephens
URL: Tabula

Extract tabular data from PDFs.
Last release 2018.
Could be nice if OpenRefine supported PDF as import and we try exporting tables from it.

Antelope

Proposed by: @lozanaross
URL: Antelope and Antelope Service.
Connect data about entities to multiple ontology/vocab sources and let the user pick the right one (with some ML-automation for the suggestions as well).
Looking to have an Antelope extension for OpenRefine; first working on a Wikibase extension.

TS4NFDI

Proposed by: @lozanaross
URL: TS4NFDI

The idea behind this software is to reconcile against different vocabularies/systems at the same time to find any identifiers, not the one in GND XOR Wikidata XOR ...

Topic		Replies	Views
👋 Introductions thread! Community Feedback	123	2723	March 21, 2025
Developer and Community Engagement Specialist: onboarding plan Development & Design	9	105	April 24, 2025
Mapping OpenRefine Ecosystem Project announcements	0	312	February 2, 2023
OpenRefine's presence at INDIAFOSS Events	2	35	September 28, 2024
Using local ChatGPT-like LLMs in OpenRefine for data wrangling Support and Helpdesk hints-and-tips	137	1446	May 23, 2025