2025 Barcamp Session Proposal: Collecting impact stories workshop

Description

This session is inspired by this talk Demonstrating Impact: Telling Your Story as a Wikimedia Affiliate from Wikimania which had some interesting bits in it about impact stories (video).

What could be really useful for us, with upcoming fundraising efforts in mind, is to start collecting stories on what impact OpenRefine has had. On slide 15 they show a way to capture these as reusable resources for fundraising. In this session, we will be starting to collect such stories from our own experiences. Each participant will get a template to fill in, and together we will work our way through it.

Format

Guided workshop. Probably 30-45 minutes long.

Session goals

A collection of resources that can be used in fundraising for Openrefine.

This session was run along with 2025 Barcamp Session Proposal: Community Workflow Demos - notes are available in this Etherpad.

2 Likes

This session sounds interesting!

2 Likes

clean up notes from the pad

Participants shared examples of workflows where OpenRefine is used as part of larger data pipelines.

Julie: HPC resource allocation workflow

Julie presented a workflow for preparing data for HPC resource allocation.

The process involves a data-cleansing step between the XLS files and an allocation script. OpenRefine is used to review and clean the data visually before running the script.

Although the workflow could be implemented entirely in R, OpenRefine is kept in the pipeline because:

  • It allows detailed visual inspection of the data
  • Small adjustments are needed each year
  • Keeping the cleaning step in OpenRefine is simpler than moving everything to R.

Uschi: Library data migration workflow

Uschi uses OpenRefine to convert library data from a legacy system into a parent library system.

The original data is loaded into OpenRefine to identify and review errors before sending corrections back to the source libraries.

Typical tasks include:

  • fixing name formatting
  • identifying shelf marks issues
  • detecting encoding problems

OpenRefine is mainly used as a discovery tool in this workflow:

  • using facets and filters
  • testing regex filters to detect encoding issues

Edits are not made directly in OpenRefine; instead, issues are reported so they can be corrected in the original systems.

Jan: Phone number formatting

Jan demonstrated a workflow for formatting phone numbers before publishing data.

This involved developing a regex transform expression in OpenRefine.

The discussion also touched on the OpenRefine Recipes page, which contains examples of expressions and workflows: Create Wikitext for Wikimedia Commons uploads

A question raised was how the recipes page should evolve and how large it should become, see related discussions:

Srihari: Web scraping and Wikimedia uploads

Srihari presented the following workflow web scraping → local database → OpenRefine → Wikimedia upload

Data sources include repositories and public websites such as:

  • Flickr
  • US Navy
  • US Army Corps of Engineers
  • EU-Lex
  • University of Texas Libraries

OpenRefine is used to prepare the data before upload.

One example mentioned was handling ambiguous or incorrect metadata, such as incorrect license information on Flickr.

The pipeline uses n8n.io for orchestration.

It was also noted that OpenRefine offers many clustering and cleanup features that could potentially be useful in other tools if they were accessible through an API.

Benjamin

Benjamin presented several workflows used in archival and research contexts:

  1. OCR → data review → OpenRefine → creation of structured data → publication

  2. NER → reconciliation → enrichment as linked data

  3. Manual data collection → cleaning / deduplication → reconciliation → publication

Related blog posts describing these workflows and projects: