LLMs are powerful tools for cleaning and enriching data, extracting entities, and generating translations. Thanks to @Sunil_Natraj, there is an excellent AI extension for OpenRefine that enables the use of local and remote LLM models with apps and services like Ollama, llama.cpp,OpenRouter, as well as most other AI services based on the OpenAI API.
In this session, I will demonstrate how to install and set up the extension in OpenRefine. Following the demonstration, I would like to discuss use cases and applications for AI in the context of data wrangling.
This session was cancelled as @Michael_Markert was not available to present. Instead, @Martin did a small demo of how the extension worked, and the group discussed the usage of LLM with OpenRefine. The notes from the shared Etherpad are available here.
Interesting. I missed the demo. I managed to install the extension, but is good documentation on the next step (setting up an LLM provider in OpenRefine) available?
Participants discussed possible uses of the OpenRefine LLM extension and shared ideas on how it could support data wrangling workflows.
Potential use cases
Several categories of use cases were highlighted based on the extension documentation.
Content transformation
summarization
translation
style conversion
format standardization
These could be useful when preparing text datasets or normalizing descriptions across records.
Information extraction
entity recognition
key fact extraction
timeline creation
relationship mapping
Participants noted that LLMs could help extract structured information from unstructured text fields.
Content analysis
sentiment analysis
theme identification
category classification
These approaches may help classify or analyze textual datasets before further cleaning or enrichment.
Multimodality
The possibility of multimodal workflows was also mentioned.
For example, the extension could potentially:
analyze images and return structured descriptions
extract information from images
interpret textual descriptions of images
Participants suggested that combining this with controlled vocabularies or predefined datasets could help constrain outputs and make results easier to integrate into OpenRefine workflows.
Model Context Protocol (MCP)
Another topic raised was the potential relationship between the extension and the Model Context Protocol (MCP).
Supporting MCP could allow OpenRefine to interact with other tools or agents in a larger AI workflow, potentially allowing external systems to guide or orchestrate OpenRefine tasks.