Pipe-delimited cell values, but need to Add Columns From Other Project

A puzzler!

Here is a snippet of my CSV:

nid	title	field_composition_metadata	field_instrumentation_metadata	field_item_author	field_subjects
54137	000.001 - The Anacreontic Song.	11162|11165|11163	11161|11166|11276|11278|11277|11338|11203|11406	the Words by Ralph Tomlinson, Esqr., late President of that Society; [Music attributed to John Stafford Smith (1750-1836)].	13224|1371|13225|7152
54138	000.002 - The Anacreontic Song	11162|11165|11163	11161|11166|11276|11278|11277|11203|11406	the Words by Ralph Tomlinson, Esqr., late President of that Society; [Music attributed to John Stafford Smith (1750-1836)].	13224|1371|13225|13227
54139	000.003 -   B-A-L-T-Baltimore. Song.	11162|11163	11161|11166	Lyric and Music by Harry H. Goldberg.	27|15|34|1013|22|186

Some of these cells have pipe-delimited values, e.g., Subjects. Those values are numeric identifiers pointing to taxonomy values in another table/project, for example the "tid" field in this Subject Taxonomy snippet:

tid,uuid,name,status,revision_id,description__value,description__format,weight,parent_name,parent_tid
13642,f3a9d4d4-46f6-4f7d-9170-1214660a7e2b," Courage",1,13642,,,0,,
1068,c1019bbf-3590-4187-83c3-8f16afde3712," Political parties",1,1068,,,0,,
5730,7fd2d6fa-7691-46dc-87f2-749acbea3c5c," Politicians",1,5730,,,0,,
1169,24bd1ee3-1a4f-4aac-81af-6d6e0d5bb773," Railroad companies",1,1169,,,0,,

I somehow need to replace the numeric IDs in the main dataset with the corresponding alphabetic values from the Taxonomy.

I've done this before -- Add Columns From Other Project -- using the old but very useful VIB-Bits plugin. What's different this time is that there are multiple values in my cell, not just one.

Thinking how best to do this.

Advice appreciated!

Mark

Whatever the solution, it's going to involve forEach() and cross()! I LUV Openrefine.

Found these, major Clues:

So good!

Yes you can build a GREL expression with forEach and cross... but let's save you some headache.

In OpenRefine there is a concept called records (see Row vs. records). Using this concept you could split the multiple values into single values but keep them together as unit using the split multi-valued cells function.

After adding the names for the ids either using cross or the VIB-Bits plugin you can then combine them back together via the join multi-valued cells function.

The advantage of this strategy is, that you transfer the complexity of forEach to a visual approach, where you can also manually handle errors and/or missing values.

1 Like

Oh, thanks so much for this strategy! Reading up on "records" and split/join multivalued cells now. Excited to be learning one more useful thing about OpenRefine.

1 Like

I'm currently working on a similar project (each line has multiple subject headings and I need to add the corresponding identifiers), and this is the way I'm doing it. I've split the subject headings out into lines in records mode, then I do cell cross to add a column for the identifiers. After the identifiers are in, I rejoin the cells in both columns.

2 Likes

Just reporting back...

I got sidetracked this week and just got back to this. B2m and jcorrice: I tried the Record method of splitting/crossing/joining and it WORKED quickly and perfectly! Spinning in chair...

I have a bunch of these to do, but now that I know how to do them -- I'm taking a Saturday Nap.

MUCH appreciated,

Mark

3 Likes