Transpose values from all rows of a certain column into one cell for a new column

LibrErli · November 8, 2023, 7:25pm

Given the snippet of the following table. My idea is to create a new column called pageoverview containing all values from the column test so e.g.

BuchdeckelVorne 1 2 3 4 5

in a second step, i woul eliminate the value of test from the specific row in the new column pageoverview so that the result would be like

row: 1 2 3 4 5
row: BuchdeckelVorne 2 3 4 5
row: BuchdeckelVorne 1 3 4 5

Any ideas whether that's possible in OpenRefine?

ostephens · November 9, 2023, 11:27am

My first thought is that you could use the "Records" functionality in OpenRefine. To do this you'll need to have a column that has a single identifier for the work/book so that OpenRefine can group all the related rows together into a single "Record" and then you can combine all the information that's in a single column together.

Do you already have a column with such an identifier in it?

If so you should:

Move the column with the work identifier to be the first column in the project
In "Rows" view mode, in this column use Edit cells -> Blank down
Switch to "Records" view mode - you should see the rows grouped together with a record number

from the 'test' column use Edit column -> Add column based on this column. Use the GREL row.record.cells.test.value.join("|") (where test is the column name containing the page numbers). Call the column "pageoverview"

The outcome should be something like

Then in the pageoverview column use Edit cells -> Transform and use the GREL filter(value.split("|"),v,v!=cells.test.value).join("|")

I think this will get the outcome you want

(you can use a different character in the join() if you want them separated in a different way, although using a space, as in your example, could be problematic if any of the labels in test already have a space in them (e.g. front cover)

b2m · November 9, 2023, 11:45am

Hi Christian,

it is possible but there are some catches:

You need a column that uniquely identifies each book.
The values in your column "test" already have to be in order.
The values in your column "test" should be unique for each book.

You need criterion 1 to be able to group all elements that belong to a book together.
You need criterion 2 to be able to write all the elements that belong to a book into a single cell.
You need criterion 3 to avoid removing the wrong element. (e.g. Cover 1 2 3 Cover => 1 2 3).

Under the premise that these conditions are fulfilled you can either use the concept of Records in OpenRefine to group all the elements of a book together into one record and then use the concept of multi-valued cells in OpenRefine to create the desired pageoverview column.

(Owen already described this while I was creating this post. It seems today I am quite the slow typer).

If you want to work more with GREL magic you could also use the cross function in OpenRefine to combine all the transformations in one step:

filter(
    row.cells["ID_COLUMN"].cross("YOUR_PROJECT", "ID_COLUMN").cells["test"].value,
    v,
    v != row.cells["test"].value
).join(" ")

So this GREL expression will use the column ID_COLUMN in the project named YOUR_PROJECT to join all the elements from your column test as a string separated by whitespace. It also filters the element that equals the value in the column test in the current row.

LibrErli · November 9, 2023, 12:01pm

thanks both for your answers, i will try out your ideas. to clarify my dataset: it is an OR project containing 650 records (with only one row). Each record represents one page of the same book.
(Aim: This project was build to upload the singe plage images to commons for a Wikisource project Category:Rechsteiner Chronik (KB AR Ms 401) - Wikimedia Commons, and the "joined" page column would be possible usable for the pageoverview parameter in the commons infobox)

ostephens · November 9, 2023, 12:03pm

Just to be clear about the terminology in OpenRefine a “record” consists of multiple rows - so I think you have (in OpenRefine terms) 1 record and 650 rows. Not the other way round as you write

LibrErli · November 9, 2023, 12:28pm

if i switch between show as "rows/records" the data tab shows once "690 row" and others "690 records". so do i have now 690 rows in one record or 690 records with each one row?

b2m · November 9, 2023, 1:11pm

OpenRefine uses the concept of records to combine several rows into one data item. Among other things this is used to represent nested data structures like JSON or XML.

To achieve this in OpenRefine the first column is used to determine which rows should be combined to a record.

Title	Page
Book 1	Page 1
Book 1	Page 2
Book 2	Page 1

Assuming the column Title is the first column in the project. From an OpenRefine perspective you have now three rows and three records (as it is most likely the case in your project).

If we now would use the blank down functionality in OpenRefine on the column Title we have the following structure:

Title	Page
Book 1	Page 1
	Page 2
Book 2	Page 1

From an OpenRefine perspective we now still have three rows, but only two records (Book 1 and Book 2).

LibrErli · November 9, 2023, 2:45pm

of course i know (or up to this discussion i thought i have an understanding of the difference between row and record in OR), thanks to avoid further misunderstanding, my project is available here SWITCHdrive

please take a look into, and tell me if i am wrong, when i say this project has 690 records and 690 rows. (1 row per record)

But maybe that's the main problem. that for your ideas with all this row.cell... functionality i should bring my project up to the situation to have one record with 690 rows.

LibrErli · November 9, 2023, 2:57pm

thank you for all your explanation and help.

i have added now a new Column with the call number of the book
moved it to the beginning of the project
now i have 1 record with 690 rows.
and the grel command row.record.cells.test.value.join("|") brought the desired result in a new column.

tfmorris · November 9, 2023, 5:16pm

Sounds like you are trying to create web navigation breadcrumbs, perhaps? Now that you've got the record sorted out, this sketch should get you close to what you want:

On column 'test', Edit Cells -> Combine Multi-Value Cells
same column, Edit Cells -> Fill Down
"", Edit Cells -> Transform... with the expression:
filter(value.split('|'),page,page!=cells['Seite'].value.toString()).join(" ")

This will give you the types of strings that you mentioned in your original note.

Tom

Topic		Replies	Views
Transpose values into new columns that indicate the existence of these values Data cleaning and transformations	2	25	February 12, 2025
Merge tickets into one row Support and Helpdesk	1	241	September 13, 2023
Transpose columns to rows Support and Helpdesk wikidata , reconciliation	2	341	December 14, 2022
Two ways of feature extraction: 1. split multi-value cells; 2. split into multiple columns Data cleaning and transformations	6	1222	March 24, 2023
Convert your rows of data into multi-row records Support and Helpdesk	4	344	January 20, 2024

Transpose values from all rows of a certain column into one cell for a new column

Related topics