How to change the maximum number of characters defined in the parser settings?
Thx in advance
Hi Lenn,
Did you find where to make your changes in our code below? For finding things in code, you can just do a code search from GitHub or your IDE for something like "CsvParserSettings".
Also if you need further detailed help on their library, you can probably ask uniVocity here: uniVocity/univocity-parsers: uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers. (github.com)
Before we dive into changing things, can you describe what it is that you were trying to do when you got this error? It may just be a bug or some other issue.
Was it a CSV file that you were trying to load? Did it have especially long lines or large text strings in the cells?
Same question as OP. Parsing a log file with JSON in one column that could be pretty large, but not unbounded.
Is this fixed here? OpenRefine/main/src/com/google/refine/importers/SeparatorBasedImporter.java at master · OpenRefine/OpenRefine · GitHub
(I'm curious how 32K maps to 256K shown in the error; four bytes for unicode?)
Would be great if this was configurable.
Hi Paul. Glad you found your way from the old mailing list. The usual questions apply: what version of OpenRefine, CSV or TSV, what error are you getting, do you have example data that demonstrates the problem, etc.
When we increased the max column count from 512 to 16K, we also reduced the maximum cell size from 256K to 32K (the default is 4K), to balance memory usage a little. If you'd like this value to be configurable, please open a feature request. Including a PR with your feature request would be awesome.
What size fields are you looking to be able to work with?
@paulmakepeace Hi long lost Paul!
Hi There, so i have this installed on windows and im getting the same error as the screen shot above. My data does have html so it may be too large and i need to bump up that parser setting number. What file do i edit in the windows version please? thanks
Hi @ezzy007
If you are using OpenRefine latest version, and trying to read large amounts of HTML data from a file (or rows of very long strings containing HTML data), then you might instead try the Line Based importer, instead of the CSV/TSV importer. Then once imported, you can partition and split the data based on separator chars (like ,
comma or even custom separators), and then further if you need to parse HTML, you can use GREL's various functions to parse the HTML to your liking into individual columns, as necessary. GREL functions | OpenRefine
If you need further help, just open a new topic here --> Support and Helpdesk category.
I have opened a Feature Request for custiomizing the max number of characters per cell. This feature would be helpful.