How to read large JSON file

Hi, I have a file that is 1,400,751KB which seems to be too big for openrefine. Is there any way I can reduce the file, or ask openrefine to only load sections of it? It managed to work with a 361,642KB file just fine. Thanks

Hi, I have a file that is 1,400,751KB which seems to be too big for openrefine. Is there any way I can reduce the file, or ask openrefine to only load sections of it? It managed to work with a 361,642KB file just fine. Thanks

1.4 GB is pretty big! Unfortunately, there's no way to skip importing some columns, although you can delete them after the fact. You can limit the number of rows which are imported, but there's no way to skip a number of rows at the beginning, making processing of the file in chunks problematic.

Outside of OpenRefine, you could use the jq utility to subset the fields/columns in the file or the records. If the structure is simple enough, you might even be able to do the row/record subsetting with a text editor or the unix head command.

What was the source of the file? If it's the output of a search API or something similar, perhaps you could change the query parameters and subset things at the source.

Sorry I can't be of more help!

Tom

We could help by providing more subsetting options at import time for JSON files.

  1. We could add the same column rename input box as in CSV/TSV importer and also allow it to serve as a "only these columns" selector via a toggle?
  2. We could add the option to skip X number of rows from the beginning?

Would those be viable enhancement requests for the JSON importer?

Depending on the data structure it sometimes helps to convert the JSON to JSONL (https://jsonlines.org/).

This allows to parse the file line by line, which (depending on the parser) makes it easier to handle larger files.

And yes, OpenRefine is able to read JSONL :heart_eyes:

1 Like