What does it mean when something is 'unreconciled'

Hello,

I’m using wikidata to reconcile a large set of name entities with wikidata. I ran it through and got about 3000 results as ‘unreconciled’. What does that mean as opposed to a ‘none’ result where it didn’t find any options. Is ‘unreconciled’ where something is just not working? For instance, I got an ‘unreconciled’ for “BARNEYS NEW YORK”, which I know has a wikidata entry.

Any ideas for what to do with the subset that are unreconciled? Can I rerun those, without re-run the other 300,000 names that were successful?

Thanks,

Joseph Anderson

Some more detail on this, using the example of BARNEYS NEW YORK. If I can change that data to just ‘Barneys’ or even ‘Barney nyc’ it reconciles correctly, but if I try it with the full BARNEYS NEW YORK it returns as ‘unreconciled’. This is very peculiar as the actual entry is Barneys New York. I’ve tried matching against types AND no particular type and it just seems that BARNEYS NEW YORK caused problems.

Any ideas?

I don’t know what causes the problem, but often I find if I reconcile several thousand items there will be tens of items that just …don’t reconcile. Generally if I try them a second time (by faceting on judgement, and then selecting unreconciled) I can get them to reconcile but recently I had a whole batch that refused to reconcile despite me trying multiple times. There was no observable difference to the cells that did reconcile, and it makes no difference whether the item exists in Wikidata or not. My solution was to mark them all as new items, and then individually reject the reconciliation match, which gave me access to reconciliation results somehow.

It’s an odd error and if there’s a way to find out what causes it I’d love to know!

When a cell goes through the reconciliation process, some additional data is created in a property of the cell:
cell.recon

This recon object is used to store all the reconciliation information for the cell (see Expressions | OpenRefine for more information)

In general language, I’d use ‘unreconciled’ to mean the cell hasn’t been through the reconciliation process yet. It’s important to differentiate between something that hasn’t been through the process and something that’s been through the process but didn’t find a match - I’d use unreconciled to mean t he former not the latter.

However, when we see the value (unreconciled) in the Judgement Facet for a column it’s more specific - its an assessment of the value of the property:
cell.recon.judgment

If you click the “Change” button at the top right of the Judgement facet you can actually see the GREL being used:
forNonBlank(cell.recon.judgement, v, v, if(isNonBlank(value), "(unreconciled)", "(blank)"))

This expression will return ‘unreconciled’ if the cell itself is not empty (cell.value isNonBlank) but the cell.recon.judgement IS blank.


In this case it looks like a cell that has (in theory at least) been through the reconciliation process is appearing as unreconciled. Looking at the underlying code I can see that this is a deliberate choice of outcome in some circumstances where the reconciliation process fails for some reason behind the scenes. (I’m not suggesting looking at the code but just for reference, this is the relevant code)

The final part of the puzzle is why the reconcilation fails in such a way that we get nothing back from the reconciliation service and that’s definitely a puzzle. I’ve just tried reconciling “BARNEYS NEW YORK” and initially got the same outcome as you describe here. After several attempts it then suddenly worked, and now it reconciles consistently. Which sounds similar to @DrThneed’s experience with their data.

I think there are two things (on the development side) it would be good to do as a result of this question:

  1. Consider changing the behaviour on unsuccessful reconciliation to be clearer to the user what has happened here
  2. Investigate what is causing the underlying issue of the reconciliation failing from Wikidata

I agree we should store errors in cells for the reconciliation operation. There is a proposal to add an option for it in the reconciliation dialog, but I would argue this should be the default:

https://github.com/OpenRefine/OpenRefine/issues/3194

Yeah, I seem to be experiencing this same phenomenon as @DrThneed. I’ve rerun the reconciliation with the ‘unreconciled’ and with each consecutive set, it successfully reconciles about half of them. So, where my initial reconciliation resulted in 3000 unreconciled, after running it 3 more times, I’m down to 500.

But then there’s something like ‘BARNEYS NEW YORK’ and another example ‘LORD AND TAYLOR’ which if I flag a single row and try running the reconciliation it sometimes will match without an issue, but when I run it as part of the larger set, I get ‘unreconciled’. So something buggy is occurring.

I wonder if there’s some sort of timeout or rate limiting going on on the wikidata end.

-Joseph

I've recreated the issue with this value ... and also had it work. But that's basically just trying to reconcile a single cell - so it might be an error specific to this value (but then as you note sometimes it works fine).

I'm a bit baffled by this but there are a lot of moving parts involved in a successful reconciliation process and so it will take some investigation to track down the underlying cause - ultimately I suspect the error itself is from outside OpenRefine but that OpenRefine isn't being very helpful in terms of informing the user what's happened

I just had this same problem. Some 30 rows didn’t want to reconcile at all, as many times I tried. There was no way, however, most of those items already existed on wikidata. The only solution has been to mark them as new items in the column and then, on each cell, “chose new match”, “search for match” and select the right one.

The main difference between reconciliation and “search for match” is that the first can have a type constraint and the second not. So all items will be proposed in “search for match”, regardless of their type.

If you reliably cannot reconcile your items even without specifying any type constraint in the reconciliation dialog, this likely means that your items are indirectly instances of Wikimedia internal item (Q17442446) which the reconciliation service avoids because such items do not refer to real world entities or concepts.

In those cases it is useful to debug Wikidata’s type hierarchy using the Wikidata Graph Builder.

Otherwise it’s just the unreliability of the Wikidata reconciliation service. Maybe someone builds something better one day!
https://phabricator.wikimedia.org/T244847