Video: Intro to OpenRefine for Data Cleaning and Reconciliation (Martin Magdinier)

This 1-hour video presents the following:

  1. About OpenRefine (purpose, its user base, and its historical evolution)
  2. Demo (installation, filtering, etc)
  3. Community (contributing, avenues for contribution, documentation)

OpenRefine stands as a robust, open-source tool specifically tailored for those delving into the complex world of messy data. It is designed to not only cleanse such data but also to transform it, making it easier to convert between varying formats.

The talk will unfold in three primary segments. The first portion provides a comprehensive introduction to OpenRefine, exploring its purpose, its user base, and its historical evolution. Following this, attendees will embark on a tour of OpenRefine, familiarizing themselves with its download and installation processes, the intricacies of data import, the nuances of filtering and faceting, clustering, as well as vital data cleaning techniques, and the application of reconciliation services. Finally, the session culminates in an invitation to participants to join the OpenRefine community, shedding light on various avenues through which they can contribute – be it through coding, design, translation, documentation enhancement, or user support.

About the Speaker

Martin Magdinier is OpenRefine Project Manager and core contributor since 2013.


Timestamps have been added to the video: Intro to OpenRefine for Data Cleaning


00:00 Data Umbrella introduction
03:35 What is OpenRefine?
05:00 History of OpenRefine (Freebase Gridworks, Google Refine to Open Refine)
08:33 OpenRefine user base
10:42 Project statistics
11:34 Features of OpenRefine
14:00 Contributing to OpenRefine (use, promote, help, translate, fix, create, design)
19:40 begin demo: Example dataset of Toronto building permits)
20:23 Running OpenRefine locally, installation
20:44 Download OpenRefine (Download | OpenRefine)
21:45 Demo: reading in the data
24:15 Demo: export data from OpenRefine
24:38 Demo: working with the data
25:30 Demo: Text facet shows summary of different values
26:45 facet / filter
27:17 combine multiple facets
28:10 text filter
28:40 Cluster algorithm to clean text data (Ex: Fingerprint function, etc)
32:54 Cluster algorithm: n-Gram fingerprint
33:30 Cluster algorithm: Cologne phonetic
34:15 Cleaning: working with numerical data
35:20 find and replace: remove commas in number
37:49 working with dates
38:40 doing reconciliations in OpenRefine (merge multiple fields into one field)
41:12 Reconciliation Service: an API
41:32 about the dataset: Bathurst Street from Wiki Foundation
44:00 connect my dataset with Wikipedia data
44:45 Reconciliation service test bench (plus: clean street name data)
47:38 Example: Excel type code for editing data
55:26 Resources list
56:20 Q: In the Reconciliation service API, which API versions are supported by OpenRefine?