This is probably a very basic question. I hope it’s ok to ask here.
I am using OpenRefine 3.6.2 and Stuart Kenny’s NER Extension. They both work fine.
I am working on a spreadsheet of Dublin Core records, trying to extract people and placenames from the freetext description field. In this screenshot you can see that the Stanford NLP tool is pretty good at doing that.
I am going to do some work on reconciling and cleaning up those results. When complete I need to have a csv with single row per record.
Once I finish cleaning and reconciling how do I get the cells in the ‘Stanford NLP’ rows into new ‘subject’ columns on the same row as the rest of the record?
Just some clarification questions:
- Is it the data in the new “Stanford NLP” column that you are reconciling?
- At the end are you looking for a row with multiple “Subject” columns, with each subject column containing reconciled cells? Or once you’ve done the reconciliation are you only interested in the reconciled entity label or ID?
I’m asking these questions because rearranging reconciled cells, including all of the information obtained through the reconciliation process (like the matches made, the remote service ID etc.) is a slightly different task to just moving around plain values - so it’s important to know exactly the required outcome before advising on the best method here
Thanks for coming back to me.
- Good question. It might be one, or the other, or even both depending on the results. I was thinking of using reconciliation to just divide entries in the Stanford NLP column into say people and places in which case I might just move the plain value. However if the reconciled service URI something the DRI can accept and display then it wouldn’t make sense to lose that information.
I guess I am getting a bit ahead of myself. Once I know that the information can be moved around that’s the important thing.
I’ll get going with reconciling and cleaning up and refresh this question when I have a concrete.example.
I’ll just note that getting a list of plain values (or a set of plain values) from a column to a row is, in my experience, more straightforward than moving cells complete with all the reconciliation data. But I think both can be achieved.
What is DRI? Is this a system you will be exporting the records into? Rearranging records into different structures depends on which system you might be exporting into. Giving us the system you plan to export into will help @ostephens and others to give you the structure you might need in OpenRefine’s rows and records for export.
Hi Thad, thanks for the response.
The DRI is the Digital Repository of Ireland. Apologies, I thought that I had linked the website when I mentioned it above.
I’ll post about the record structure when I get a bit further along in the process.