Dear all,
I am facing an issue while importing a dataset into OpenRefine. The file is available here:
Google Spreadsheet Link
This dataset is associated with the following journal paper:
https://doi.org/10.1177/01655515251362383
The dataset includes 35,514 rows of data + 1 header row. When I open it in LibreOffice, the row count matches exactly (35,515 total including the header).
However, when I import it into OpenRefine, I encounter a discrepancy:
- During preview, OpenRefine correctly shows 35,515 rows.
- After creating the project, it only shows 35,149 rows.
So, 365 rows appear to be missing after import.
What I have already tried:
- Exporting the file from Excel/LibreOffice as CSV (UTF-8) and importing that instead of XLSX.
- Exporting as TSV and importing.
- Checking import options (UTF-8 encoding, separator, "parse next line as headers", "store blank rows").
- Verifying that there are no completely empty rows or embedded line breaks in the data.
Despite these attempts, the problem persists and I cannot get the full 35,514 rows into OpenRefine.
Has anyone experienced a similar issue or can suggest a reliable way to ensure all rows are retained during import?
Thanks in advance for your guidance!
Parthasarathi Mukhopadhyay