RDF extension missing clear matches

Hi there,

I'm the metadata coordinator for an archives in Texas. My institution recently upgraded from LODRefine to OpenRefine 3.4.1. We are running the software on an Ubuntu VM I tunnel into to run the software. The machine has 16gb of RAM. We opted for 3.4.1 because our workflows depend on the RDF extension and the VIB-Bits, and that version is the latest they are compatible with per this chart.

I manage several local taxonomies that have anywhere from several hundred to the largest of 25,000 terms. I regularly reconcile data against these taxonomies. The workflow that I had previously used involved modeling a SKOS RDF file based on a taxonomy (usually just term and ID) in OpenRefine using the RDF extension and using those RDF files to create a reconciliation service. With LODRefine, this served me well.

However, after the upgrade to 3.4.1, the RDF extension seems to struggle with matching text I am reconciling. Sometimes text is 100% identical to a term in the taxonomy/SKOS/reconciliation service I made. Not only does it not match the text, it doesn't even identify the identical term in the reconciliation service as a candidate. If I have to manually reconcile 100s or sometimes thousands of terms that should have been matched automatically, it adds a lot of time to my work.
RDF extension 1

Has there been any change in the code for matching terms in recent upgrades to the plugin that would cause it to miss obvious matches? This is a wonderful plugin that has always served me well. I don't remember this being a problem in the past. Has anyone else experienced this? In many instances it seems like it's only looking at the first word in a term and not evaluating the match based on any additional words in the term.

Thanks,
Olivia

PS. In searching for an alternative, I am looking at reconcile-csv and am having trouble launching it after following the instructions. Our systems archivist suspects it could be a heap issue given the CSV file sizes. That may be another post.

1 Like

@Olivia just so you know this has been read - I did take a look but was unable to tell why the version of the RDF extension you are using isn't proving effective with matches. The RDF extension has gone through changes since the version in LODRefine and so it's possible that this is the cause, but I wasn't able to really dig deep enough to know

Possibly you could try posting an issue of the RDF Extension github at Issues · stkenny/grefine-rdf-extension · GitHub to see if you can get a response there as the extension isn't maintained by the OpenRefine project.

Sorry I can't be of more help. I'd be interested to know if you were able to get reconcile-csv working in this instance

PS. In searching for an alternative, I am looking at reconcile-csv and am having trouble launching it after following the instructions.

There's a known bug in that where it announces the wrong URL at startup time, but I would recommend against
using it since it's been basically unsupported since the author passed away in 2015. You might want to look at
https://github.com/gitonthescene/csv-reconcile

Tom