I have two columns in OpenRefine - column OCLC-DDC contains the most frequent DDC number obtained from OCLC through classify API (through web scraping by ISBN number of books) e.g.181.045, and another column Local-DDC (DDC number of the same book in a local library). e.g 181.4.
There are many such rows with these two columns. How can I match these two columns in a way that it will give score 1 if values of these two column matches exactly, score 0.50 if first three digits matches (before '.'), score 0.75 if first three digits as well as three digits after '.' matches, and score 0 if first three digits do not match? I was trying something like this:
if(cells["DDC-MF"].value.trim().substring(0,3) == cells["NLI-ddc"].value.trim().substring(0,3),0.5,if(cells["DDC-MF"].value.trim().substring(0,6) == cells["NLI-ddc"].value.trim().substring(0,6), 0.75,if(cells["DDC-MF"].value.trim() == cells["NLI-ddc"].value.trim(),1,0)))
But the issue here that -1) If two class number matches exactly they also match up to the three digits after '.', and 2) it seems not an elegant solution.
Any suggestions?
Thanks and regards