2025 Barcamp Session Proposal: Collecting impact stories workshop

Ainali · August 29, 2025, 9:42am

Description

This session is inspired by this talk Demonstrating Impact: Telling Your Story as a Wikimedia Affiliate from Wikimania which had some interesting bits in it about impact stories (video).

What could be really useful for us, with upcoming fundraising efforts in mind, is to start collecting stories on what impact OpenRefine has had. On slide 15 they show a way to capture these as reusable resources for fundraising. In this session, we will be starting to collect such stories from our own experiences. Each participant will get a template to fill in, and together we will work our way through it.

Format

Guided workshop. Probably 30-45 minutes long.

Session goals

A collection of resources that can be used in fundraising for Openrefine.

This session was run along with 2025 Barcamp Session Proposal: Community Workflow Demos - notes are available in this Etherpad.

Katerina · September 8, 2025, 2:47pm

This session sounds interesting!

Martin · March 12, 2026, 6:45pm

clean up notes from the pad

Participants shared examples of workflows where OpenRefine is used as part of larger data pipelines.

Julie: HPC resource allocation workflow

Julie presented a workflow for preparing data for HPC resource allocation.

The process involves a data-cleansing step between the XLS files and an allocation script. OpenRefine is used to review and clean the data visually before running the script.

Although the workflow could be implemented entirely in R, OpenRefine is kept in the pipeline because:

It allows detailed visual inspection of the data
Small adjustments are needed each year
Keeping the cleaning step in OpenRefine is simpler than moving everything to R.

Uschi: Library data migration workflow

Uschi uses OpenRefine to convert library data from a legacy system into a parent library system.

The original data is loaded into OpenRefine to identify and review errors before sending corrections back to the source libraries.

Typical tasks include:

fixing name formatting
identifying shelf marks issues
detecting encoding problems

OpenRefine is mainly used as a discovery tool in this workflow:

using facets and filters
testing regex filters to detect encoding issues

Edits are not made directly in OpenRefine; instead, issues are reported so they can be corrected in the original systems.

Jan: Phone number formatting

Jan demonstrated a workflow for formatting phone numbers before publishing data.

This involved developing a regex transform expression in OpenRefine.

The discussion also touched on the OpenRefine Recipes page, which contains examples of expressions and workflows: Create Wikitext for Wikimedia Commons uploads

A question raised was how the recipes page should evolve and how large it should become, see related discussions:

Srihari: Web scraping and Wikimedia uploads

Srihari presented the following workflow web scraping → local database → OpenRefine → Wikimedia upload

Data sources include repositories and public websites such as:

Flickr
US Navy
US Army Corps of Engineers
EU-Lex
University of Texas Libraries

OpenRefine is used to prepare the data before upload.

One example mentioned was handling ambiguous or incorrect metadata, such as incorrect license information on Flickr.

The pipeline uses n8n.io for orchestration.

It was also noted that OpenRefine offers many clustering and cleanup features that could potentially be useful in other tools if they were accessible through an API.

Benjamin

Benjamin presented several workflows used in archival and research contexts:

OCR → data review → OpenRefine → creation of structured data → publication
NER → reconciliation → enrichment as linked data
Manual data collection → cleaning / deduplication → reconciliation → publication

Related blog posts describing these workflows and projects:

Topic		Replies	Views
2025 Barcamp Session Proposal: How OpenRefine is used for contributing to Wikidata and Wikimedia Commons Events barcamp-2025	2	98	March 12, 2026
👋 Introductions thread! Community Feedback	133	4419	February 26, 2026
Video: Intro to OpenRefine for Data Cleaning and Reconciliation (Martin Magdinier) OpenRefine documentation	1	560	February 23, 2024
OpenRefine's presence at INDIAFOSS Events	2	78	September 28, 2024
Mapping OpenRefine Ecosystem Project announcements	0	331	February 2, 2023