I'm working on a file of survey data with 908 records. The file has 11134 rows. The first column contains the unique ID's, which are three letters followed by numbers and underscores (e.g. "ABC12343_3"). The data is split down so that each record has rows with a unique ID, other important identifying info, and the data for each answer. I am also working in Excel on this spreadsheet, to use pivot tables. I haven't had issues up until now. However...
When I go to import the xlsx spreadsheet to OpenRefine to make changes, OpenRefine does not recognize the records, instead I see 11134 Rows AND Records when I toggle between the views.
What am I doing wrong? How can I fix this? I've been scanning through FAQ and documentation and I'm incredibly confused.
I've tried saving the xlsx file as different formats and importing those (.txt, .csv, etc.) I've tried changing the encoding to UTF-8. I've tried removing the text from the beginning of the Unique ID's. I'm not sure what else to do. It wont recognize my records as records.
Thanks in advance for saving my hair (which is in danger of being torn out)
The way OpenRefine groups rows together into records is based on the content of the first column of your data table. When there is a row which has a value in the first column, OpenRefine will group all subequent rows which have NO value in the first column as part of the same record. So for example:
Column 1 |
Column 2 |
Column 3 |
ABC12343_3 |
Some data |
Some more data |
ABC12343_3 |
Other data |
more other data |
ABC12343_4 |
Even more data |
even more other data |
Here, even though the first two rows have the same "Column 1" value, they don't get grouped as a record because they both have some value in Column. To get them grouped as a record you would need to have:
Column 1 |
Column 2 |
Column 3 |
ABC12343_3 |
Some data |
Some more data |
|
Other data |
more other data |
ABC12343_4 |
Even more data |
even more other data |
Now the second row has an empty first column, OpenRefine will treat it as a part of the same record (when you go to record view) as the row above it.
OpenRefine has a specific function to help get the data from the first example table above to the second, to help move rows into records, which is the "Blank down" option in the "Edit cells" menu. If you have data with repeated values in your first column, you can use "Blank down" to preserve only the occurrence in the first row of a set of repeated values - so "Blank down" on my first example table above, would change the data to the second example table above
OpenRefine doesn't care what is used as the unique key, and the import format also won't be a factor - the only thing OpenRefine cares about when making records is:
- The first column of the project
- Whether that column has a value or not for a particular row
So if your data looks like:
Column 1 |
Column 2 |
Column 3 |
ABC12343_3 |
Some data |
Some more data |
ABC12343_3 |
Other data |
more other data |
ABC12343_4 |
Even more data |
even more other data |
Use "Blank down" on the first column, switch to Records mode, and hopefully that will be the fix you need.
If your data ALREADY looks like
Column 1 |
Column 2 |
Column 3 |
ABC12343_3 |
Some data |
Some more data |
|
Other data |
more other data |
ABC12343_4 |
Even more data |
even more other data |
and it still isn't working, the easiest way of diagnosing is if you can share a screenshot or some example data for us to help do more analysis on the issue
Hope this helps
Owen
1 Like
Well thats embarrassing. Thanks for your help, it was the first thing you outlined. I had uniqueID's filled down in the first column.
Really appreciate your help with this, and replying to me so fast. Thanks for keeping OpenRefine running for amateurs like me!
1 Like