Hi there,
I'm the metadata coordinator for an archives in Texas. My institution recently upgraded from LODRefine to OpenRefine 3.4.1. We are running the software on an Ubuntu VM I tunnel into to run the software. The machine has 16gb of RAM. We opted for 3.4.1 because our workflows depend on the RDF extension and the VIB-Bits, and that version is the latest they are compatible with per this chart.
I manage several local taxonomies that have anywhere from several hundred to the largest of 25,000 terms. I regularly reconcile data against these taxonomies. The workflow that I had previously used involved modeling a SKOS RDF file based on a taxonomy (usually just term and ID) in OpenRefine using the RDF extension and using those RDF files to create a reconciliation service. With LODRefine, this served me well.
However, after the upgrade to 3.4.1, the RDF extension seems to struggle with matching text I am reconciling. Sometimes text is 100% identical to a term in the taxonomy/SKOS/reconciliation service I made. Not only does it not match the text, it doesn't even identify the identical term in the reconciliation service as a candidate. If I have to manually reconcile 100s or sometimes thousands of terms that should have been matched automatically, it adds a lot of time to my work.
Has there been any change in the code for matching terms in recent upgrades to the plugin that would cause it to miss obvious matches? This is a wonderful plugin that has always served me well. I don't remember this being a problem in the past. Has anyone else experienced this? In many instances it seems like it's only looking at the first word in a term and not evaluating the match based on any additional words in the term.
Thanks,
Olivia
PS. In searching for an alternative, I am looking at reconcile-csv and am having trouble launching it after following the instructions. Our systems archivist suspects it could be a heap issue given the CSV file sizes. That may be another post.