Collapsing records into rows

Hello friends, I am working with a data set where each entry had multiple lines of data. However, when I ran my document through pdfsandwich and Tabula, it recognized each line of data as its own row. The Records function in OpenRefine shows me my data basically how I want it to look– but does so with multiple rows per record. And when I export the data, there are still multiple rows per record. My question is, how do I collapse the multi-row records into one row each, as the data was originally written? Here is what it looks like in OpenRefine records mode:

As you can see, OpenRefine is correctly grouping information by Report # (which it is calling Type Report # since I had to OCR the original document). How can I get OpenRefine to just create one row per report, with all the associated information for each report squished back into one row like the original spreadsheet was?

Thank you in advance!!

Hi! The short answer is that you’ll want to use Edit cells>Join multi-valued cells in the drop down menu for each column that contains information in multiple rows. It will squish the data that’s contained in multiple rows into a single cell, so make sure you’re picking a good delimiter (I like to use a “|”).

However, you could do a little more work to pull some of that data out into a new column before you perform the join multi-valued cells function, depending on how important the squished data is. For example, if you wanted two different columns for “Imported to RMS” and “Imported to State”, you could text filter or custom facet in rows mode by “RMS”, create a new column using Edit column>Add a column based on this column, and then transform the RMS data in the original column to a null value. Now you have two columns, One for RMS and one for State, and when you perform join multi-valued cells in records mode the blank rows will disappear.

Thank you so much! The data is still messy for other reasons but this did help to consolidate the records that had this problem!