The class com.google.refine.importers.ImportingParserBase contains three public parseOneFile methods. Two of them only throw a NotImplementedException.
The one method with the implementation seems to be invoked only from within the class. Therefore, it could be private. The TODO comment already suggests that. I tried to change the visibility to private and it worked without problem.
I guess you have to implement one of the two unimplemented methods. Therefore, they are not abstract. As a beginner, I find that a bit confusing. I probably stumbled over that when working on an import for the Apache Parquet format (issue #1929). So far, my import has only ever returned a NotImplementedException via the web interface.
The class com.google.refine.importers.ImportingParserBase contains three public parseOneFile methods. Two of them only throw a NotImplementedException.
The one method with the implementation seems to be invoked only from within the class. Therefore, it could be private. The TODO comment already suggests that. I tried to change the visibility to private and it worked without problem.
I guess you have to implement one of the two unimplemented methods. Therefore, they are not abstract. As a beginner, I find that a bit confusing. I probably stumbled over that when working on an import for the Apache Parquet format (issue #1929).
Your guess is correct. There are definitely cleaner and less confusing ways to design this API, but it's a bit of legacy cruft. The history is that only one of the two styles of interface were supported initially (probably the Reader version, but I'm not 100% sure) because that worked with the initial set of importers. When an importer which needed access to the raw InputStream was contemplated, the API was refactored while attempting to maintain maximum source and binary API compatibility. This API could certainly be cleaned up and the Java language has had a lot of enhancements in the last 14 years (the current API was implemented in 2011) which might make it easier to do in a non-breaking way.
So far, my import has only ever returned a NotImplementedException via the web interface.
The most likely cause for that is a mismatch by the constructor call and the method that you've implemented. The argument to the constructor (useInputStream = true/false) will determine which method the calling framework uses.
Let us know if you have any other questions. It's great to see this importer being worked on!
BTW, I need to refresh my memory about the dataflow, but I suspect that this can be refactored to expose the SeekableInputStream which Parquet requires. That was a bigger task than I had time for when I first took at hack at this last year, but your implementation is a good incentive to re-address the topic.