I’m Jason Best, Director of Biodiversity Informatics at the Botanical Research Institute of Texas in Fort Worth, Texas, USA. I use OpenRefine for a number of data cleaning and data wrangling processes including cleaning of data generated from community scientist transcriptions of plant specimen data and wrangling that into Darwin Core compliance.
I’m looking forward to learning from and sharing with this community!
Hello and welcome Jason!
Hi! I’m Julie, forest scientist, HPC specialist, and lead of a GIS, forest, and climate data-sharing portal in Quebec, Canada. I’ve been using and teaching OpenRefine for some years now. Excited to be part of this! Sandra brought me here and said I should introduce myself so let’s gooooo
Welcome Julie - really glad to see you here. I’ve always felt that OpenRefine has a lot of untapped potential for GIS data!
Hi, this is Benjamin , a Data Scientist from the Research Data Management Lab at the State Archives of Baden-Württemberg (Germany , FDMLab@LABW).
We discovered OpenRefine in 2021. As a former Full Stack Developer my solutions mostly have been code and/or CLI based. With OpenRefine we suddenly had a platform that included a GUI and the possibility to run small code snippets. Using the Windows version with embedded JRE my non technical colleagues immediately could start using OpenRefine on their government issued hardware.
We are currently using OpenRefine for data wrangling tasks and for data enrichment using the Reconciliation Protocol against public and internal services.
I am regularly challenging myself to solve a data problem using OpenRefine instead of coding a solution, to find new ways to teach problem solving using OpenRefine.
While teaching our colleagues how to use OpenRefine we soon discovered that it is hard to remember how to solve some problems using OpenRefine if you do not use it regularly. So I also started writing tutorials in German on how to solve small and complex problems using OpenRefine at https://fdmlab.landesarchiv-bw.de .
Welcome Benjamin - great to meet you!
The tutorials look really great (via Google Translate as I can only remember a small amount of German from my school days). A few years ago when I was working on a project that used OpenRefine we setup a wiki to let users document their own instructions/tutorials https://openlibraryfoundation.atlassian.net/wiki/spaces/GOKB/pages/655671/OpenRefine+How-Tos
We included in this wiki what we called “macros” which were pre-written OpenRefine operation history JSON files that would carry out regular tasks. For example here is the JSON that renames a set of columns from a common data format we were using to the column names needed in our projects https://openlibraryfoundation.atlassian.net/wiki/spaces/GOKB/pages/655729/Rename+KBART+columns+to+GOKb+columns
Reading your tutorial at Entfernungen zu GPX Tracks mit OpenRefine ermitteln | FDMLab@LABW reminded me of this as you have a similar step to rename and remove columns after the initial GPX import. I wondered if you’d considered adding some JSON snippets that could be posted into the project Undo/Redo “Apply operation history” dialogue and so short-cut those steps in future.
My main problem with the JSON-Operations is, that it is sometimes hard to see what column names are required or affected by the operations. Also the error reporting if some step fail is (was?) quite bad. So I was already thinking about some helper tool that would make it easier to extract some basic markdown template based on the JSON-Operations to make documenting and sharing them easier. But as always time is a limited resource .
Definitely see that point - in the GOKb project I linked to we developed an extension that would automatically off the existing ‘macros’ and apply the operations in that macro from a menu driven interface within the UI. This gave us a bit more control and ability to offer a slightly nicer interface to the user
I’m Padraic, a librarian based in Dublin, Ireland. I’ve played around with OpenRefine previously, and attended trainings but never really used it in anger. I have a few draft digital collections where I want to increase the usefulness of the metadata so I hope to use it quite a bit over the coming weeks and months.
Hi all! It’s fun to see who else is using OpenRefine. I am an information scientist who works in the biodiversity collections community (like @jbest), and I have used OpenRefine extensively myself, as well as developed community-specific trainings. It is one of my very favorite tools, and truly has been a lifesaver for me more than once! I only recently discovered Wikidata and OpenRefine’s Wikidata reconciliation service plus other functionalities for editing in bulk–this is now my newest rabbithole
Hi, I’m Rachel Helps. I’m the Wikipedian-in-residence at the Brigham Young University library. Most of my past work has been on editing Wikipedia pages related to our collections or related Commons uploads. I have done some work with mix’n’match in the past and presented on it at WikiConference North America a few years ago. I did a small upload to Wikidata from OpenRefine of early works by a Mormon woman author named Josephine Spencer. I made a list of things I initially had trouble modeling here. I still haven’t decided on which way I prefer for serial works! I don’t have a library science degree, but 15 years ago I had a job doing data entry and cleanup (back when bubble sheets were still commonly used on-campus).
Right now I’m working with data from our Mormon Literature and Creative Arts database to see if I can structure it to put some of it on Wikidata. But of course, in cleaning the data, I’m discovering how much is missing! Maybe that’s something other people can commiserate with, haha.
I’m Tom Morris, project leader emeritus. I started contributing to Gridworks, as OpenRefine was called then, in May 2010. I have over 40 years of professional software engineering experience, over 20 years of product management and technical marketing experience, as well as many years of experience as an agile coach and consultant. In addition to my implementation work, I also used to teach OpenRefine for Harvard staff as part of the DST4L series and fellow Bostonians as part of some of the Data Science Meetups.
My latest professional gig was building the genomics data infrastructure for a pharma company in support of their cell and genetic therapies division, but I’ve built a large variety of things over the years, often from scratch. I’ve also been head of product for an open source software product and multi-year mentor for the Google Summer of Code.
I’m kind of concerned that we went from an audience of 700+ on the old user mailing list to just 90 here, but I guess we’ll see if the others find us eventually.
I’m really enjoying this thread and hearing about the use of OpenRefine in different communities!
Hello, I’m Lucy Hinnie, I currently work as the Wikimedian in Residence at the British Library and as a Digital Skills Wikimedian at Wikimedia UK. I love OpenRefine and its many tips and tricks, but I find I often get stuck on simple things, so may need to ask your patience at various points! At the moment I’m puzzling through some reconciliation tasks to enable some bulk uploads of bookbindings information to Wikidata.
Hello Lucy and welcome to the forum
I am Michelle and I work as project lead at Wikimedia Nederland, where I support or GLAM organisations with their content donations to Wikimedia Commons and Wikidata. I am gradually using Open Refine for more and more tasks, not only regarding content donations but also administratively (it truely is Excel on steriods) and look forward to learning more tips and tricks from you!
Welcome @Michelle - I hope we can help you as you learn and use OpenRefine !
Hello everyone !! Myself Monalika pursuing BTech in Computer Science from India. My area of interest majorly lies in Web Development and Machine Learning and I am eager to apply my knowledge and skills to make a positive impact in the open-source community.
Glad to be the part of community!
Connect with me over Linkedin: https://www.linkedin.com/in/monalika-patnaik-b38931230
Curious, what format(s), @Jan ?