Axiell - using OpenRefine to clean data - typical workflow recipes?

I am currently giving several OpenRefine trainings in the Netherlands (cultural sector, mainly cultural heritage and museums).

Many cultural organizations here use Axiell software, e.g. Axiell Collections or Adlib, for managing and describing their collections. They are interested in using OpenRefine to clean data and then re-import the cleaned version. In workshops, people have asked me for tips specifically on how to do the re-import correctly, and if OpenRefine can help doing that as smoothly as possible.

I am starting this thread to collect tips and recipes. If you have done such export > data cleaning with OpenRefine > import workflows before, how do you typically do it? What do you need to pay attention to? What kind of export format do you use?

1 Like

This is a great question - a common use I see for OpenRefine with library data is wanting to take the data from a system (e.g. a library catalogue), clean up the data, and then contribute back the cleaned data to the original system. I worked through an example with Library catalogue data in the MARC format in a series of blogposts a few years ago - in that example I used the Templating export method to convert the records to a format called ‘Mnemonic MARC’ A worked example of fixing problem MARC data: Part 5 – OpenRefine and MarcEdit redux | Overdue Ideas

In that post I also reference an example of creating MODS format (and XML format sometimes used for library data) with the templating export: Converting Spreadsheets into MODSXML using Open Refine | Digital Scholarship Unit

1 Like

Prior to OpenRefine 3.1, I would encourage using TSV format for the roundtripping because of a few bugs, but those have been addressed (library updates) and now CSV can be a great lightweight format as long as UTF-8 is utilized end-to-end.
There are less guarantees that we have I think for data formatting resiliency in the other formats because we actually don’t have tests for roundtripping. I’d like to see more fuller roundtrip tests added to our automation testing suite (Cypress).

1 Like

The question is much more specific than recommending the generic export formats, such as csv/tsv - it’s about the specific data models (and preceding workflows) that will correctly import the data into said systems.

1 Like

I have seen quite some interest around adapting OpenRefine to have bespoke integration for various platforms, along the lines of what we currently have for Wikibase (which is itself along the lines of what we had for Freebase).

For instance there is (was?) GOKb (as explained by Owen), Ontotext Refine or the Snac OpenRefine extension.

All of those involve developing custom integration (either as a fork or an extension of OpenRefine), so this is a significant investment. I think there is a lot of potential to make this easier (having more appropriate extension points, and better extension architecture overall, mostly). But I do not see how we could completely remove the need for any software development there: platforms have their own data models, APIs, and just their own ways of doing things, so it’s unlikely we can have a one-size-fits-all integration.

But so in that line of thought, there could be a Axiell extension with a dedicated exporter tailored to whatever format they can work with, for instance.

1 Like

GOKb no longer users OpenRefine - at least not directly :crying_cat_face:

1 Like

I said that indeed during a workshop I gave last week :slight_smile:
This, and any, community will need to assess very well whether it's worth their effort - can they organize and afford it, does it provide enough benefit, and can they maintain such an extension over the long run?

For now, I'd like to redirect this thread to the topic of recipes - examples, tips, solutions that Axiell users can currently apply for their workflows. If any reader here can share such, I'd be grateful, and I'm sure it will be useful for many colleagues.

I have been asking around via e-mail, and have been doing some online detective work.

For Dutch speakers (heritage organizations using Axiell in Flanders and in the Netherlands): there is now a small OpenRefine working group as part of the larger Adlib / Axiell user group (gebruikersgroep / Netwerk Adlib en Axiell Collections). Their forum has some discussion around the topic, and you can find some of the people involved in this working group: Netwerk Adlib en Axiell Collections

The working group will apparently build upon earlier information and materials collected by meemoo, including these: Publicatie:Open Refine handleiding voor cultureel erfgoed collecties - Cultureel Erfgoed Standaardentoolbox

1 Like

The National Historical Museums in Sweden imports data from OpenRefine into Axiell Collections(Via CSV I think). I know they at least do this for reconciled GeoNames and Wikidata identifiers.

I can forward an email to the right persons in case it’s of interest.

2 Likes

That would be extremely helpful, @abbe98 - especially if they’d be willing to share examples of what a data export from Axiell Collections looks like, how they then transform it in OpenRefine, and in what format they re-import it.

1 Like

To answer your contact’s question, Axiell Collections software has a very good CSV import facility. Basically your column header needs to be either the tag or the english data dictionary name of the target field. For more occurrences of a specific field, just add a column to the right, with the same tag or name. Do make sure that column is only for that field. You also need a unique identifier for each row, usually object number or record number.

Collection’s online help has a nice instruction, read it from top to bottom. It’s good!

https://help.collections.axiell.com/en/Topics/Import%20data.htm

Who am I? I’m a member of the board of Network Adlib and Axiell Collections, formerly known as Adlib User Group (Netherlands and Flanders). I’m also a freelance technical support guy for said products, since end 2009. (Before that, I worked at their support department.)

Early this year, a working party was formed called OpenRefine and Data Cleaning, which currently has about 16 members. We meet online every first or second monday at 14:00 CET. We started out on OpenRefine, but we digress all over the place, depending on what questions members come up with.

To join the working party, you or your organisation has to be a paying member of the Network, which is 75€ per year.

There’s MUCH more. Adlib/Axiell also has the Adlib/Axiell Designer application builder tool with an extremely powerful import tool, coupled with a C-like programming language called Adapl. Large scale database conversion to the Axiell database format is the name of the game. I’m a regular player.

If you’re interested in more, give us a shout.

Cheers,
Rolf

2 Likes

Thank you so much @RolfBly. Am I correct that the above is only for Dutch and Flemish users of the software (and network members)? I can imagine that internationally, more people would be interested :wink:

Hi Sandra, I guess for now the answer is yes. The language is Dutch, plus the platforms are paid for by (Dutch & Flemish) member’s contributions.

That said, I’d suggest everyone internationally who would want to join in on a similar working party, just raise hands y’all & let’s see what we can do.

Meeting online via Zoom is cheap.

The Dutch working party also has a mailing list, a google doc (for useful links & other stuff), and a Whatsapp group.

Signal, Telegram, Mastodon, or this Discourse right here? All fine by me.

1 Like

As far as I'm concerned, Axiell users around the world would be extremely welcome to organize OpenRefine-related discussions here on the OpenRefine forum. It's your space, it's for all OpenRefine users! Same goes for other significant user communities with common use cases!

2 Likes