clean up notes from the pad
Roundtable of trainers’ experiences
Irving
Irving described teaching OpenRefine primarily to journalists working on investigations, as well as people working in cultural heritage contexts.
He highlighted several training resources used in data journalism education:
These materials are helpful references for teaching OpenRefine in journalism training programs.
Julie
Julie teaches OpenRefine mainly to graduate students and librarians, often in science and engineering contexts.
One strategy that works well in her training is using personal stories about time saved with OpenRefine compared with manual spreadsheet cleaning. These examples help demonstrate the practical value of the tool.
She also emphasized the importance of selecting datasets relevant to participants’ work, which increases engagement and helps learners immediately see how OpenRefine applies to their own data.
Example infrastructure that could support hosted environments for workshops: https://github.com/ComputeCanada/magic_castle
Jan
Jan described working mostly with the Wikimedia community. These sessions are often less formal training sessions and more interactive demonstrations or livestream-style workshops where participants ask questions during the session.
For Wikimedia workflows, it is possible to run OpenRefine in the browser through PAWS, the Wikimedia cloud environment: https://wikitech.wikimedia.org/wiki/PAWS
This can help avoid installation issues during training sessions.
Benjamin
Benjamin shared experiences teaching OpenRefine to several groups:
- Archivists through internal workshops
- New archivists as part of onboarding training programs
- Librarians and archivists in the ZB Zürich master’s program
- Digital humanities and history students, including summer schools
- GLAM professionals through conference workshops and online “brown bag” lecture programs
Training materials and datasets used in some of these workshops are available here:
https://fdmlab.landesarchiv-bw.de/workshops/
One approach that worked well was scheduling two training sessions separated by some time, allowing participants to practice and return with questions about topics they want to explore further.
Benjamin also recommended preparing web-accessible OpenRefine instances as a backup in case participants encounter installation problems.
Louise
Louise trains several groups including:
- university collection managers
- librarians and museum collection managers
- researchers and students
Training formats include:
- online workshops
- in-person workshops (including barcamps and conferences)
- open educational resources (OER)
She is currently developing OER training material using LiaScript: https://liascript.github.io/course/?https://raw.githubusercontent.com/soda-collections-objects-data-literacy/OpenRefine-Beginner-Tutorial/main/SODa-OpenRefine-Beginner-Tutorial.md#1
For exercises, she uses demo datasets where errors were intentionally introduced, allowing participants to practice solving realistic data cleaning problems.
Louise also attempted to organize follow-up support calls after training sessions, but participation dropped after the first sessions.
Tom
Tom described training multiple cohorts of Harvard librarians through the Data Science for Librarians (DST4L) program.
He also teaches OpenRefine as part of introductory data-wrangling training for data scientists, addressing a broader audience beyond libraries.
Training practices discussed
Across these experiences, several practices emerged:
- Use domain-relevant datasets to make exercises meaningful.
- Include hands-on practice, ideally with enough time for participants to experiment.
- Allow time between sessions so learners can practice independently.
- Prepare backup environments or hosted installations to avoid installation issues.
For hands-on workshops, the group suggested allocating at least 3–4 hours, ideally including a break.