OpenRefine 2024 Barcamp: Approaches to Training People to Use OpenRefine

In this session (see Barcamp page), trainers shared their experiences and materials, emphasizing the importance of tailored content and practical exercises, such as "bring your own data" sessions. We discussed strategies for effective teaching, including the structure of training sessions, managing different skill levels, and providing ongoing support through office hours and webinars.

Roundtable of Trainers' Experiences

We began with a roundtable where trainers shared their experiences and materials for providing OpenRefine training.

Owen Stephens

Materials written by @ostephens for British Library training:

Curriculum structure:

  • Start with the general grid interface and how to use filter/facet
  • Clustering to get the "aha" moment early on, explaining how clustering works
  • Column rename and order - more complicated compared to spreadsheets
  • Sorting
  • Undo/redo
  • Datatype handling and arrays - challenging after several hours
  • Exporting data

Other sessions after the introduction cover:

  • Retrieving data from a web source
  • parseJSON
  • Reconciliation
  • Cross
  • Regular expressions
  • Using row.index with a custom facet to limit to 10 records
  • No non-text datatypes or complex GREL expressions (forEach, isBlank, ...) covered

Comments: Managing different levels and backgrounds of students in the training is challenging, and the hardest topics are often covered last. Emphasizes the risk of losing information when cleaning data.

Yael Netzer

@yaeln training sessions usually happen in a single day. She shared her materials from:

Tom Morris

@tfmorris shared his experience with a 2-day OpenRefine training bridging into a 1-day Python introduction. Two days seem to be the sweet spot.
Slides from the old Data Science for Librarians (DST4L) course for Harvard librarians.

Alicia Fagerving

@Alicia_Fagerving : When training for using OpenRefine with Wikidata, it is important that the student already knows manual editing in Wikidata and SPARQL, otherwise, it is too much to learn in one session.

@Alicia_Fagerving's training materials:

Landesarchiv Baden-Württemberg

@mack-v: If users/students use OpenRefine in a specific project, they tend to ignore other knowledge about OpenRefine, focusing only on what is relevant to their workflow. It is hard to have a generic workshop.

@b2m: The trainer doesn't share their screen; instead, one student shares their screen. This slows things down and frees up the trainer's mind to transfer knowledge.

Training materials in German: Workshops

Julie Faure Lacroix

@jfaurel uses a dataset about bed bugs to catch everyone's attention. The dataset is messy and presents GREL at every step of the process, matching point-and-click features with GREL functions. She introduces a chain of replace().replace() for cleaning accent characters in a French dataset. She uses API queries to get elevation points of places in the dataset. She does not cover reconciliation,

She emphasizes the importance of understanding data bias and that the dataset does not show the full picture.

Sharing Knowledge

Trainers exchanged tips on how to run a course successfully.

Course Schedule

@tfmorris: Teachers should be careful not to include too much content in a single session.
@b2m: Have training on separate days so people can attend the basic session if they only need that level and then a second session for advanced topics.
@b2m: Start the afternoon with "aha" moments (e.g., clustering) to regain students' attention.

Bring Your Own Data & Office Hours

@tfmorris did "bring your own data" sessions for users to practice with their data.
@AtesComp organized periodic webinars where people can join and get help with their data or show topical usage.
@ostephens: Tried the "bring your own problems/dataset" approach but found it difficult to get people to share.
@ej2432 organized Wikibase working hours, 1-hour meetings where users could present specific use cases or features they are using Wikibase for.
@Alicia_Fagerving provided office hours to answer user questions and used this opportunity to introduce GREL to achieve specific tasks. Museum professionals later published their own screencasts on solving typical problems in their community.

Next Steps: Scheduling Regular Calls Between OpenRefine Trainers?

@ej2432 suggested having regular sessions between trainers to share tips.
@mack-v mentioned Wikimedia Germany has irregular sessions.
@martin noted that The Carpentries also have regular sessions (today at 5 PM).