Incomprehensible behaviour on trying to add new (and not merge) statements in Wikidata

I'm working on electoral candidates in the New Zealand general election. I have successfully added statements for candidates standing in an electorate (P3602 candidacy in election, value 2023 New Zealand general election, with qualifiers of P768 electoral district and P72 member of political party).

However I now need to add statements relating to each candidate's ranking on their party list (a different way to enter parliament). I cannot make the edits using QuickStatements as the property and main value will be the same, but with different qualifiers (P768 included but with electoral district as list candidate, and series ordinal value of list position, and P72 which will sometimes have the same value and sometimes be different).

No matter what settings of add or add/merge I use I cannot make the edits, even though no error is reported. Additionally - and alarmingly - in one attempt OR detected that the 2020 general election statements somehow matched (see edit history)?! I've updated to the latest OR version today with no change in behaviour.
Example item here, the 2020 statements show what I am trying to achieve: David Seymour - Wikidata
I believe this error is similar to one reported in January by Helen Williams, which doesn't appear to have been resolved.
Help much appreciated as I am unable to complete the task of loading all our election info for NZ until I can add statements without merging them.

ETA: This is what the schema does when applied to an item that does not have a pre-existing P3602 candidacy statement: Liz Gunn - Wikidata
And this is Helen's reporting of what looks to be the same issue Adding additional statements to Wikidata property

So I gave up trying to diagnose the problem. For some reason it is item-specific - after finding I could add the statements fine to the Wikidata sandbox, I added them to some other candidates successfully, and I think there may only be two items OR refused to edit for me - the two I had used to test the schema! (Q99771411 and Q17055796). I've added the statements I needed by hand. Curious if anyone has any suggestions as to what was going on though.

Any idea what you did to successfully get the statements added via OpenRefine @DrThneed, or did you do them all by hand?
I tried working on my unresolved issue this morning and am still stuck! I've posted on that thread to update it and see if anyone can help. :slight_smile:

Ugh sorry to hear you're still stuck. Here's what I did:

  1. Established there was no problem with the schema itself by using it to add a statement to a Wikidata sandbox.
  2. Established that the pre-existence of a 2023 election candidacy statement with different qualifiers didn't prevent the addition of the statement (again, using the Wikidata sandbox).
  3. Selected a small number of rows in OR to try the statements for real (avoiding the initial two people I used and failed with), and then checked my edit contributions to make sure the edits were made (they were).
  4. Did a number of similar batches (10, 20, 30 people), and counted rows of edit contributions to be sure none were failing (all was fine).
  5. Went ahead and did the remaining several hundred and checked a few of them at random (all good).
  6. Tried the initial two people again, they both failed again, so I added the statements by hand.
    So I am still worried that OR didn't report any error or tell me no edit was made for those two people, and I cannot see any reason for them failing. I will have to cross my fingers and hope the same problem doesn't raise its head when I'm trying to get the election results in as qualifiers to the statements later on!

Annnd...a sparql query revealed there were in fact a further seven statements missing that OpenRefine wouldn't make. I have tracked down some of them and have spotted a pattern which, if I'm right, does indicate a problem with OpenRefine.
It seems that OpenRefine is mis-identifying when a statement already exists.

So for instance on this item, Mike Britnell - Wikidata
The person stood in 2020 and again in 2023, and the qualifiers to the statement (series ordinal, party, electorate) are going to be exactly the same.
So EVEN THOUGH THE MAIN VALUE OF THE STATEMENT IS DIFFERENT (ie 2023 election not 2020) OpenRefine has added the new 2023 reference to the old 2020 statement. :astonished:

(this explains why I had no problem with most items, because the series ordinal will have changed for most people except those high up the list)

Thanks for ploughing through with this debugging! So it seems that you found situations where the matching with the existing values is too lax, i.e. OpenRefine is treating two values as identical when it should not.

We can likely open an issue about this. To maximize the chances of it getting it fixed, it'd be amazing if you could provide a schema (where the subject field can be a constant, pointing to a specific item), exported as a file, which reproduces the bug reliably: uploading the edits should make a single edit which creates the duplicate statement. (The edit can then be undone manually.) If your schema relies on any columns from your project, then a copy of the project (exported as an OpenRefine project archive) would be very helpful.

No worries.

I can get it to reliably make the same mistake on a person item (Mike Britnell), where it fails to make the new 2023 statement and adds the 2023 reference to the 2020 statement.
I can get it to reliably fail to add the statement to the Wikidata sandbox (after copying across the 2020 candidate statement from Mike Britnell, so that there is an existing statement for OR to be confused by), but for some reason it behaves differently in the sandbox and does not add the reference to the 2020 statement.
The project archive is here electorate-and-list-MPs-election-2023.openrefine.tar.gz - Google Drive
The schema is here schema(4).json - Google Drive

If you filter by blank in the first column of the project you will get the Britnell and Sandbox items I have been using.

1 Like

@antonin_d does this need a Github issue creating? If so in which Github repo?

1 Like

I looked through the existing issues and could not find one that matches this particular problem, so I think it would be great to have one indeed. It would be in the main repository.

1 Like