Hi:
I'm trying to load a really big XML file, AAT.xml. This is the very well known Getty Art & Architecture Thesaurus. You can download it too.
The file:
$ wc --lines AAT.xml
23905953 AAT.xml
$ wc --chars AAT.xml
904373344 AAT.xml
I'm running Fedora Linux 38 and launched OR v 3.7.5 set to 10Gb of memory. When loading the file the application stops here:
and the log outputs:
15:47:08.323 [ refine] POST /command/core/get-importing-job-status (1000ms)
15:47:09.326 [ refine] POST /command/core/get-importing-job-status (1003ms)
15:47:10.323 [ refine] POST /command/core/get-importing-job-status (997ms)
15:47:10.387 [ refine] GET /command/core/get-csrf-token (64ms)
15:47:10.404 [ refine] POST /command/core/importing-controller (17ms)
15:47:10.408 [..e.importers.XmlImporter] Error generating parser UI initialization data for XML file (4ms)
I checked the file with xmllint, but I think there is no XML structural problem:
$ xmllint --nonet --noout AAT.xml
AAT.xml:1: namespace error : Namespace prefix xsi for noNamespaceSchemaLocation on Vocabulary is not defined
AAT/AATGetSubject.xsd" Title="Art & Architecture Thesaurus" Date="2022-6-23"
^
Tried with 20Gb memory too with same results.
Any idea of what is happening?