Open Refine not importing Records?

I'm working on a file of survey data with 908 records. The file has 11134 rows. The first column contains the unique ID's, which are three letters followed by numbers and underscores (e.g. "ABC12343_3"). The data is split down so that each record has rows with a unique ID, other important identifying info, and the data for each answer. I am also working in Excel on this spreadsheet, to use pivot tables. I haven't had issues up until now. However...

When I go to import the xlsx spreadsheet to OpenRefine to make changes, OpenRefine does not recognize the records, instead I see 11134 Rows AND Records when I toggle between the views.

What am I doing wrong? How can I fix this? I've been scanning through FAQ and documentation and I'm incredibly confused.

I've tried saving the xlsx file as different formats and importing those (.txt, .csv, etc.) I've tried changing the encoding to UTF-8. I've tried removing the text from the beginning of the Unique ID's. I'm not sure what else to do. It wont recognize my records as records.

Thanks in advance for saving my hair (which is in danger of being torn out)

The way OpenRefine groups rows together into records is based on the content of the first column of your data table. When there is a row which has a value in the first column, OpenRefine will group all subequent rows which have NO value in the first column as part of the same record. So for example:

Column 1 Column 2 Column 3
ABC12343_3 Some data Some more data
ABC12343_3 Other data more other data
ABC12343_4 Even more data even more other data

Here, even though the first two rows have the same "Column 1" value, they don't get grouped as a record because they both have some value in Column. To get them grouped as a record you would need to have:

Column 1 Column 2 Column 3
ABC12343_3 Some data Some more data
Other data more other data
ABC12343_4 Even more data even more other data

Now the second row has an empty first column, OpenRefine will treat it as a part of the same record (when you go to record view) as the row above it.

OpenRefine has a specific function to help get the data from the first example table above to the second, to help move rows into records, which is the "Blank down" option in the "Edit cells" menu. If you have data with repeated values in your first column, you can use "Blank down" to preserve only the occurrence in the first row of a set of repeated values - so "Blank down" on my first example table above, would change the data to the second example table above

OpenRefine doesn't care what is used as the unique key, and the import format also won't be a factor - the only thing OpenRefine cares about when making records is:

  • The first column of the project
  • Whether that column has a value or not for a particular row

So if your data looks like:

Column 1 Column 2 Column 3
ABC12343_3 Some data Some more data
ABC12343_3 Other data more other data
ABC12343_4 Even more data even more other data

Use "Blank down" on the first column, switch to Records mode, and hopefully that will be the fix you need.

If your data ALREADY looks like

Column 1 Column 2 Column 3
ABC12343_3 Some data Some more data
Other data more other data
ABC12343_4 Even more data even more other data

and it still isn't working, the easiest way of diagnosing is if you can share a screenshot or some example data for us to help do more analysis on the issue

Hope this helps

Owen

1 Like

Well thats embarrassing. Thanks for your help, it was the first thing you outlined. I had uniqueID's filled down in the first column.

Really appreciate your help with this, and replying to me so fast. Thanks for keeping OpenRefine running for amateurs like me!

1 Like