I’m having problems creating a project with JSON and RDF sample files. Both data in those files is the same. They are the export of a mediawiki website (https://tunearch.org/) I would like to work with to upload their data to WD. I already have their permission.
I attach a zip with the sample files.
The problem with both files is that they don’t create the right project, which would be only seven records, whichever record path I specify on the json file. There is nothing I can set on the RDF file, but it has the same problem. I wonder if there is a problem with their format.
The imported records should be these, but the projects created are anything but these records:
I would appreciate if someone can confirm he/she can create a project with those files or has the problem as me.
Thank you very much.
sample-files.zip (3.8 KB)
I can confirm that I see the same problem with the JSON. The problems is caused bythe use of the tunes name as a JSON property, rather than a consistent property that is the same for each tune. e.g. we see:
"results": {
"Green Joke (The)": {
"printouts": {
...
},
"Harp that in Darkness": {
"printouts": {
"Also known as": [
etc.
rather than something more like:
"results": {
"tune": {
"name": "Green Joke (The)",
"printouts": {
...
},
"tune": {
"name": "Harp that in Darkness",
"printouts": {
"Also known as": [
etc.
The use of a different property for each entry in the results object means that there isn’t a way for OpenRefine to understand the structure correctly.
However, the RDF basically works for me. I get 7 records plus some lines generated from the
<owl:DatatypeProperty>
and <owl:ObjectProperty>
statements. The rest looks, at a glance, OK to me - but as I’m not familiar with the data I could easily be missing something.
How does the project created by the RDF import look to you?
Thanks, @ostephens. I realized that the json had that problem. I see in the RDF also the extra entries you mention, which is quite messy. At least I can confirm it’s not me but the files, which will not be of much use. A simple CSV is much more useful for now.
What it seems odd to me is that those files are generated by the Semantic Mediawiki extension, and I supposed they would be compatible with OpenRefine.
Although there is good integration between OpenRefine and various mediawiki platforms, they are completely separate projects - OpenRefine is not a mediawiki project. So as far as I know there's never been any discussion of how OpenRefine should work with the Semantic Mediawiki extension. I hadn't come across this extension until you mentioned it here so I don't really know how it works, but a brief glance does suggest that there are quite a lot of configuration options for the extension - so possibly the way it's been configured for tunearch is also an issue - but I really don't know
Is there a SPARQL endpoint for the traditional tune archive?
No, there is not. It’s provably a matter of configuration. Thanks, anyway. I’ll use the cvs files.
If you are not afraid of the Go programming language, you could try ojg which is a extremely fast Json parser that recently landed a modify() function in its JsonPath expressions. jp feature request: Set() that only replaces existing values · Issue #99 · ohler55/ojg (github.com)
You could ask Peter in an issue if/how it might help your situation to move things around.
1 Like