2025 Barcamp Session Proposal: Trainers’ Tips & Experiences for Effective OpenRefine Training

Description
Building on last year’s OpenRefine 2024 Barcamp: Approaches to Training People to Use OpenRefine this session brings together experienced trainers to share practical tips, materials, and lessons learned when teaching OpenRefine to diverse audiences. We’ll discuss designing hands-on exercises, managing mixed-skill groups, leveraging remote training tools, and setting up follow-up support structures (office hours, community calls, etc.). Whether you’re new to training or a seasoned educator, come contribute your insights and take away a richer toolkit for your next workshop.

Format
Interactive roundtable / peer-exchange (demo + discussion).

Session goals

  • Surface and compare best practices for structuring an OpenRefine training agenda
  • Exchange ready-to-use materials (slide decks, sample datasets, “bring your own data” guides)
  • Share strategies for handling varied participant skill levels and backgrounds
  • Explore tools and techniques for engaging virtual audiences (Zoom features, breakout rooms, Q&A formats)
  • Discuss models for ongoing learner support (webinars, office hours, community calls)

Please indicate in the thread below if you are interested in presenting or participating as a trainer

Hello,

I’m Irving Huerta, a lecturer in Journalism at City University, in the UK. I’d like to join this session as a participant and potentially presenting on how I’ve taught OpenRefine to my students.

I’m sorry I can’t join on the Monday, but I'll be happy to contribute the rest of the days.

Best,

Irving

1 Like

Etherpad from the call: Etherpad

clean up notes from the pad

Roundtable of trainers’ experiences

Irving

Irving described teaching OpenRefine primarily to journalists working on investigations, as well as people working in cultural heritage contexts.

He highlighted several training resources used in data journalism education:

These materials are helpful references for teaching OpenRefine in journalism training programs.

Julie

Julie teaches OpenRefine mainly to graduate students and librarians, often in science and engineering contexts.

One strategy that works well in her training is using personal stories about time saved with OpenRefine compared with manual spreadsheet cleaning. These examples help demonstrate the practical value of the tool.

She also emphasized the importance of selecting datasets relevant to participants’ work, which increases engagement and helps learners immediately see how OpenRefine applies to their own data.

Example infrastructure that could support hosted environments for workshops: https://github.com/ComputeCanada/magic_castle

Jan

Jan described working mostly with the Wikimedia community. These sessions are often less formal training sessions and more interactive demonstrations or livestream-style workshops where participants ask questions during the session.

For Wikimedia workflows, it is possible to run OpenRefine in the browser through PAWS, the Wikimedia cloud environment: https://wikitech.wikimedia.org/wiki/PAWS

This can help avoid installation issues during training sessions.

Benjamin

Benjamin shared experiences teaching OpenRefine to several groups:

  • Archivists through internal workshops
  • New archivists as part of onboarding training programs
  • Librarians and archivists in the ZB Zürich master’s program
  • Digital humanities and history students, including summer schools
  • GLAM professionals through conference workshops and online “brown bag” lecture programs

Training materials and datasets used in some of these workshops are available here:

https://fdmlab.landesarchiv-bw.de/workshops/

One approach that worked well was scheduling two training sessions separated by some time, allowing participants to practice and return with questions about topics they want to explore further.

Benjamin also recommended preparing web-accessible OpenRefine instances as a backup in case participants encounter installation problems.

Louise

Louise trains several groups including:

  • university collection managers
  • librarians and museum collection managers
  • researchers and students

Training formats include:

  • online workshops
  • in-person workshops (including barcamps and conferences)
  • open educational resources (OER)

She is currently developing OER training material using LiaScript: https://liascript.github.io/course/?https://raw.githubusercontent.com/soda-collections-objects-data-literacy/OpenRefine-Beginner-Tutorial/main/SODa-OpenRefine-Beginner-Tutorial.md#1

For exercises, she uses demo datasets where errors were intentionally introduced, allowing participants to practice solving realistic data cleaning problems.

Louise also attempted to organize follow-up support calls after training sessions, but participation dropped after the first sessions.

Tom

Tom described training multiple cohorts of Harvard librarians through the Data Science for Librarians (DST4L) program.

He also teaches OpenRefine as part of introductory data-wrangling training for data scientists, addressing a broader audience beyond libraries.

Training practices discussed

Across these experiences, several practices emerged:

  • Use domain-relevant datasets to make exercises meaningful.
  • Include hands-on practice, ideally with enough time for participants to experiment.
  • Allow time between sessions so learners can practice independently.
  • Prepare backup environments or hosted installations to avoid installation issues.

For hands-on workshops, the group suggested allocating at least 3–4 hours, ideally including a break.