Hi @trevorstratton1
Yes this is possible using the OpenRefine cross()
function. I’ll describe how to do this below, and I’m always happy to see people using OpenRefine - but I think in this case you could also probably do this work using the “VLOOKUP” function in Excel or Google Sheets.
The cross()
function in OpenRefine works by looking up values from one project in another project and makes the full data from any matching row available to you. Probably the trickiest part of the process is getting the data you retrieve from the cross()
function to layout exactly as you want it in your project
To get started import both spreadsheets into OpenRefine in separate projects. I’ll assume these are called “Identifiers” and “Metadata”
To take an example of filling out the Title column in the “Identifiers” project - in the “Identifiers” project, from the “Title” column you can choose the menu option Edit cells
→ Transform...
and in the Expression use the cross()
function. The cross()
function will look something like:
cells["Identifier (ID)"].cross("Metadata","Identifier (ID)")
This will do a lookup of the identifier from each cell in the “Identifier (ID)” column and and find any rows in the “Metadata” project that have the same value in the “Identifier (ID)” in that project. However by itself this isn’t quite enough. What you get back from this function is an array (list) of matching rows. As long as each Identifier only appears in the Metadata project once, you should only get back a single row in each case, but you may want to check that is the case (i.e. double check no duplicate IDs have been included in the Metadata project). Looking at the test data you’ve supplied I can see the Identifiers “00086_04_01_01_09_001_002-002” appears four times in the “Test Sheet-Metadata” file - so that’s something you may want to resolve before proceeding.
So you need to use an expression like:
cells["Identifier (ID)"].cross("Metadata","Identifier (ID)")[0].cells["Title"].value
This does the cross
(as described above) and then takes only the first row returned (this is what [0]
does) and then cells["Title"].value
grabs the value from the Title column in the row you’ve matched in the “Metadata” project.
Using the method above you’d need to do the same process for each column you want to populate (Title, Creator, Associated People etc. etc.). But because of the way you’ve laid out the “IDs” sheet with all the columns you need to populate already there, there is a shortcut.
Instead of doing the columns one by one you can do the following:
- In the “Identifiers” project click the “All” column dropdown menu (first column in the project always) and select “Transform”
- In the expression box enter
cells["Identifier (ID)"].cross("Metadata","Identifier (ID)")[0].cells[columnName].value
- Click “OK”
- In the “Select columns” dialog, uncheck columns called “Variable A” ,“Variable …” etc. and the “Identifier (ID)” column - so only the metadata columns (Title, Creator etc.) are selected
- Click OK
This will fill out all the columns selected in the “Select columns” dialog based on the equivalent column in the Metatdata project
Hope this helps
Owen