How to read large JSON file

Michal_Deva · June 24, 2024, 12:28am

Hi, I have a file that is 1,400,751KB which seems to be too big for openrefine. Is there any way I can reduce the file, or ask openrefine to only load sections of it? It managed to work with a 361,642KB file just fine. Thanks

tfmorris · June 24, 2024, 2:56am

Hi, I have a file that is 1,400,751KB which seems to be too big for openrefine. Is there any way I can reduce the file, or ask openrefine to only load sections of it? It managed to work with a 361,642KB file just fine. Thanks

1.4 GB is pretty big! Unfortunately, there's no way to skip importing some columns, although you can delete them after the fact. You can limit the number of rows which are imported, but there's no way to skip a number of rows at the beginning, making processing of the file in chunks problematic.

Outside of OpenRefine, you could use the jq utility to subset the fields/columns in the file or the records. If the structure is simple enough, you might even be able to do the row/record subsetting with a text editor or the unix head command.

What was the source of the file? If it's the output of a search API or something similar, perhaps you could change the query parameters and subset things at the source.

Sorry I can't be of more help!

Tom

thadguidry · June 24, 2024, 3:17am

We could help by providing more subsetting options at import time for JSON files.

We could add the same column rename input box as in CSV/TSV importer and also allow it to serve as a "only these columns" selector via a toggle?
We could add the option to skip X number of rows from the beginning?

Would those be viable enhancement requests for the JSON importer?

b2m · June 24, 2024, 6:09am

Depending on the data structure it sometimes helps to convert the JSON to JSONL (https://jsonlines.org/).

This allows to parse the file line by line, which (depending on the parser) makes it easier to handle larger files.

And yes, OpenRefine is able to read JSONL

Topic		Replies	Views
Parser Settings Support and Helpdesk	8	330	March 14, 2025
Importing 300MB SQLite fails (freezes, no errors) Support and Helpdesk	2	26	May 29, 2025
Help with splitting a json array Support and Helpdesk	4	265	May 14, 2024
Invalid json error for a previously used json file Support and Helpdesk	6	35	June 12, 2025
Size of dataset supported by OpenRefine Support and Helpdesk	2	165	April 30, 2024

How to read large JSON file

Related topics