Row inconsistency in OpenRefine from Excel/CSV/TSV for a file

Dear all,

I am facing an issue while importing a dataset into OpenRefine. The file is available here:
Google Spreadsheet Link

This dataset is associated with the following journal paper:
https://doi.org/10.1177/01655515251362383

The dataset includes 35,514 rows of data + 1 header row. When I open it in LibreOffice, the row count matches exactly (35,515 total including the header).

However, when I import it into OpenRefine, I encounter a discrepancy:

  • During preview, OpenRefine correctly shows 35,515 rows.
  • After creating the project, it only shows 35,149 rows.

So, 365 rows appear to be missing after import.

What I have already tried:

  • Exporting the file from Excel/LibreOffice as CSV (UTF-8) and importing that instead of XLSX.
  • Exporting as TSV and importing.
  • Checking import options (UTF-8 encoding, separator, "parse next line as headers", "store blank rows").
  • Verifying that there are no completely empty rows or embedded line breaks in the data.

Despite these attempts, the problem persists and I cannot get the full 35,514 rows into OpenRefine.

Has anyone experienced a similar issue or can suggest a reliable way to ensure all rows are retained during import?

Thanks in advance for your guidance!

Parthasarathi Mukhopadhyay

Hi @psm

I just tried importing the spreadsheet (I downloaded as CSV) and after importing I’m seeing 35,149 RECORDS, but if I force the project into ROWS view I see 35,514 ROWs

The discrepancy is down to 365 rows not having any data in the “Authors” column which appears first. If you move the Title column to be the first column, then the number of Records and the number of Rows matches up and is 35,514 as expected

Hello Owen

Thanks for showing me the point as usual. I was expecting wrongly that rows and records will be the same in this case.

Now, everything is perfectly working.

Regards

-Parthasarathi

2 Likes