2025 Barcamp Session Proposal: Using the OpenRefine LLM Extension

Martin · March 12, 2026, 6:07pm

clean up notes from the pad

Participants discussed possible uses of the OpenRefine LLM extension and shared ideas on how it could support data wrangling workflows.

Potential use cases

Several categories of use cases were highlighted based on the extension documentation.

Content transformation

summarization
translation
style conversion
format standardization

These could be useful when preparing text datasets or normalizing descriptions across records.

Information extraction

entity recognition
key fact extraction
timeline creation
relationship mapping

Participants noted that LLMs could help extract structured information from unstructured text fields.

Content analysis

sentiment analysis
theme identification
category classification

These approaches may help classify or analyze textual datasets before further cleaning or enrichment.

Multimodality

The possibility of multimodal workflows was also mentioned.

For example, the extension could potentially:

analyze images and return structured descriptions
extract information from images
interpret textual descriptions of images

Participants suggested that combining this with controlled vocabularies or predefined datasets could help constrain outputs and make results easier to integrate into OpenRefine workflows.

Model Context Protocol (MCP)

Another topic raised was the potential relationship between the extension and the Model Context Protocol (MCP).

Supporting MCP could allow OpenRefine to interact with other tools or agents in a larger AI workflow, potentially allowing external systems to guide or orchestrate OpenRefine tasks.

A related discussion is available here: Should we develop a MCP Server for OpenRefine?

Data sharing and privacy considerations

Participants also emphasized the importance of considering what data is shared with LLM services.

When using hosted models through public APIs, standard privacy considerations apply:

understand what data is being sent to external services
verify whether organizational policies restrict sending certain datasets to external APIs
consider using local or self-hosted models if working with sensitive data

Choosing between local models and external services depends on the data being processed and institutional policies.

Topic		Replies	Views
Community call: OpenRefine LLM Extension Demo Events	0	166	June 18, 2026
Tutorial - Openrefine and LLM extension Data cleaning and transformations	1	59	June 5, 2026
Using local ChatGPT-like LLMs in OpenRefine for data wrangling Support and Helpdesk hints-and-tips	137	3683	May 23, 2025
Using LLMs in OpenRefine for data wrangling with Hugging Face inference API Support and Helpdesk hints-and-tips	2	179	December 4, 2024
OpenRefine 2024 Barcamp: If only OpenRefine could be more like Development & Design barcamp-2024	0	124	July 9, 2024

2025 Barcamp Session Proposal: Using the OpenRefine LLM Extension

Potential use cases

Multimodality

Model Context Protocol (MCP)

Data sharing and privacy considerations

Related topics