Your most successful approach in teaching OpenRefine to newcomers?

I give OpenRefine introductions regularly (maybe once every two months). Usually, I am the only trainer, either in an in-person or online (Zoom) setting. I usually end up doing a slow and deliberate two-hour demo of OpenRefine’s features, leaving room for people’s questions of course. I also emphasize that people should do online searches to find tips/workflows for the specific tasks they want to do, and that we have good/extensive documentation. I try to give a clear message that there’s a lot of how-tos and documentation ‘out there’ to help them.

However, I’m not sure if that’s the best approach. I’d like to end up with an outcome of ‘as many trainees as possible feeling confident to give OpenRefine a whirl and then train themselves further’.

What have been your most successful approaches in training people to use OpenRefine, and why?

P.S. I’m usually the only trainer in the room, with a group of up to a dozen learners (no assistance/buddy). I have tried a more ‘hands on’ approach, asking people to do what I’m doing or to do small tasks on their laptops, but then I always get stuck because someone (or multiple people in the room) will have trouble installing OpenRefine (even if I asked them to install the software beforehand…), or will get stuck with very simple things for which they need me to look over their shoulders, stalling the entire workshop. This is why I resorted to the ‘passive demo’ approach. I’m curious to hear what other trainers do in a similar situation.


I work from the Library Carpentry OpenRefine lesson. It has been successful for larger and smaller groups, in person and on Zoom. I have taught lessons between 90 and 120 minutes. If the group of learners is larger than two or three, I find it essential to have someone else helping me with student questions about installation, errors, etc.


I am indeed starting to realize that I’d best have a helper at all times. But I don’t have someone at the ready for that in most cases… Maybe I could pre-record myself giving the basic explanations, and then help people while the video runs.

Any solo trainers here who have found a good mode of doing it on their own anyway?

I have now also asked a group to do the Library Carpentry lessons beforehand, autonomously, so that my presence will be focused on answering specific questions and troubleshooting any problems they may have. That workshop takes place next week and I’m curious to see how that will go.

I have a few different approaches I use.

Ensuring that participants have got OpenRefine installed and working in advance is definitely a big thing - very highly recommend that instructions are given out in advance and that the importance of sorting this before the class is emphasised.

I’ve taught the Library Carpentry course and variations of it many times. I’m biased but I think it’s a good overall introduction :slight_smile:
I’m definitely interested to hear how your group doing the lesson independently goes!

I regularly deliver training for British Library staff on how to use OpenRefine and the approach I’ve taken has changed over time. I used to do a one day training session (approx 4 hours) which I documented at Working with Data using OpenRefine | Overdue Ideas - that’s got the lesson outline and a link to the materials.

I think that worked pretty well as a face-to-face course, but with the Covid pandemic, we moved to an online format and that required a new approach. So since 2019 instead of a one day course, I do that course as a series of online sessions delivered over one week. In the online sessions (90-120 mins each) I introduce concepts and demonstrate things - and then between sessions there are a series of online exercises that participants try based on what I’ve already shown them. At the start of the next online session there is an opportunity for them to ask questions or go over any challenging aspects of the exercises and then I move onto the next topic. The last online session is used to answer questions and look at any particular aspects participants want to go over.

I think this online version of the course, spread out over a longer period of time, is a good format and I think participants are able to self-pace and also, of course, have all the exercises and documentation at the end of the course that they can repeat and refer to. The materials I use for this course are available at Introduction to OpenRefine online course – Google Disk

Hope some of this is of help!

1 Like

We used to do it in person but now I’ve been teaching exclusively online. I usually have a couple of helpers but they haven’t been necessary since we switched to zoom (probably a bad sign or am I getting super good??).

The way we work now is we have a completely virtual Jupyter Lab environment that runs OpenRefine on one of our cloud clusters (GitHub - ComputeCanada/magic_castle: Terraform modules to replicate the HPC user experience in the cloud). Everyone gets a username and a password at the beginning of the class. I usually do 3h of the basics and 3h of more advanced stuff like working with APIs. Here’s a link to my notes, which also contains information about the dataset I like to use because it’s messy but anonymized.
cq-openrefine/utilisation_openrefine.pdf at master · calculquebec/cq-openrefine · GitHub

Typically, I like to know in advance what field most of my class studies but I found myself defaulting to the bed bug dataset more often than not. The bed bug problem in Montreal made the news a few years back and a lot of people still remember it.

It’s also a great dataset to teach the responsibilities that come with sensitive data once they are cleaned and crossed with other datasets. Because I often combine it with an introduction to data visualization, I show how easy it is to figure out the precise locations of the infestations and how dangerously easy it is to make broader conclusions about immigration and poverty.

So that’s how I do it!