In this session (see Barcamp page), trainers shared their experiences and materials, emphasizing the importance of tailored content and practical exercises, such as "bring your own data" sessions. We discussed strategies for effective teaching, including the structure of training sessions, managing different skill levels, and providing ongoing support through office hours and webinars.
Roundtable of Trainers' Experiences
We began with a roundtable where trainers shared their experiences and materials for providing OpenRefine training.
Owen Stephens
Materials written by @ostephens for British Library training:
- Introduction
- Advanced
- Dataset for BL - sample of 1000 metadata records from BL images on Flickr
- Library Carpentry course
Curriculum structure:
- Start with the general grid interface and how to use filter/facet
- Clustering to get the "aha" moment early on, explaining how clustering works
- Column rename and order - more complicated compared to spreadsheets
- Sorting
- Undo/redo
- Datatype handling and arrays - challenging after several hours
- Exporting data
Other sessions after the introduction cover:
- Retrieving data from a web source
- parseJSON
- Reconciliation
- Cross
- Regular expressions
- Using
row.index
with a custom facet to limit to 10 records - No non-text datatypes or complex GREL expressions (forEach, isBlank, ...) covered
Comments: Managing different levels and backgrounds of students in the training is challenging, and the hardest topics are often covered last. Emphasizes the risk of losing information when cleaning data.
Yael Netzer
@yaeln training sessions usually happen in a single day. She shared her materials from:
- Potsdam Summer 2023
- Using OpenRefine for scraping
- Example for a full week - summer school - Distance Reading in Catalogues
- Useful functions
Tom Morris
@tfmorris shared his experience with a 2-day OpenRefine training bridging into a 1-day Python introduction. Two days seem to be the sweet spot.
Slides from the old Data Science for Librarians (DST4L) course for Harvard librarians.
Alicia Fagerving
@Alicia_Fagerving : When training for using OpenRefine with Wikidata, it is important that the student already knows manual editing in Wikidata and SPARQL, otherwise, it is too much to learn in one session.
@Alicia_Fagerving's training materials:
- Longer project by Wikimedia Sverige with two museums encompassing training on OpenRefine/Wikidata
- Video tutorials (in Swedish) produced by the participants in the project.
Landesarchiv Baden-Württemberg
@mack-v: If users/students use OpenRefine in a specific project, they tend to ignore other knowledge about OpenRefine, focusing only on what is relevant to their workflow. It is hard to have a generic workshop.
@b2m: The trainer doesn't share their screen; instead, one student shares their screen. This slows things down and frees up the trainer's mind to transfer knowledge.
Training materials in German: Workshops
Julie Faure Lacroix
@jfaurel uses a dataset about bed bugs to catch everyone's attention. The dataset is messy and presents GREL at every step of the process, matching point-and-click features with GREL functions. She introduces a chain of replace().replace()
for cleaning accent characters in a French dataset. She uses API queries to get elevation points of places in the dataset. She does not cover reconciliation,
She emphasizes the importance of understanding data bias and that the dataset does not show the full picture.
Sharing Knowledge
Trainers exchanged tips on how to run a course successfully.
Course Schedule
@tfmorris: Teachers should be careful not to include too much content in a single session.
@b2m: Have training on separate days so people can attend the basic session if they only need that level and then a second session for advanced topics.
@b2m: Start the afternoon with "aha" moments (e.g., clustering) to regain students' attention.
Bring Your Own Data & Office Hours
@tfmorris did "bring your own data" sessions for users to practice with their data.
@AtesComp organized periodic webinars where people can join and get help with their data or show topical usage.
@ostephens: Tried the "bring your own problems/dataset" approach but found it difficult to get people to share.
@ej2432 organized Wikibase working hours, 1-hour meetings where users could present specific use cases or features they are using Wikibase for.
@Alicia_Fagerving provided office hours to answer user questions and used this opportunity to introduce GREL to achieve specific tasks. Museum professionals later published their own screencasts on solving typical problems in their community.
Next Steps: Scheduling Regular Calls Between OpenRefine Trainers?
@ej2432 suggested having regular sessions between trainers to share tips.
@mack-v mentioned Wikimedia Germany has irregular sessions.
@martin noted that The Carpentries also have regular sessions (today at 5 PM).