In OpenRefine when reconciling I can select "Auto-match candidates with high confidence". This does not really work for me, as it confidence value varies between different data sets that I'm matching.
What I need is to be able to manually set the value for when it should auto-match, each time i reconcile. If I could set this value, I'd be able to run the reconciliation process a couple time to adjust the value to one that works to auto-match most entities.
One of the most common cases I'm dealing with at the moment is company names where one of the data sets includes "Inc" or "Ltd" in the company name where the other does not. I need to be able to match all of them automatically, because there is a lot of them and clicking "Search for match" on each one in OpenRefine is not feasible.
The other issue I have is the scoring mechanism. I need to be able to adjust the weight for each scoring features.
A*name-matching+B*identifier-matching+C*date-matching+D*quantity-matching. Where the weight A,B,C,D are user specified. For C and D I should ideally be able to set the weight for individual properties, i.e. if person had a "birth date" and a "married date", I could choose to weigh the birth date more, although this is not strictly necessary.
The most important is to set the threshold for when to auto-match entities. I'm not sure if that is possible to set somehow with OpenRefine, or if that would require it to be added to the "Reconile column..." box in the UI? What I'd need is really just a text form next to the "Auto-match candidates with high confidence" ticker, where the threshold value could be set, similar to the textbox below where you can set the "Maximum number of candidates to return".
Weighing my options here. I'm running the wikibase reconciliation service locally on my computer, so I can modify it if need.I'm wondering about maybe just calling that directly from a jupyter notebook instead of using OpenRefine if this is hard to do with OpenRefine itself for now.
And for the wikibase reconciliation service, would adding such user specified weighing require modifications to the code, or is it supported with some parameters?