How to find / group more than one property (values) after 'Add column from reconciled values'?

Annika_Hendriksen · March 27, 2023, 9:19am

I’ve reconciled persons with Wikidata-items and added extra information using the ‘Add column from reconciled values’, but the columns with multiple properties, such as given names (P735), occupations or Identifiers, do not show these second or third value of the chosen properties.
Switching from Rows to Records does not seem to help; from my initial 18.103 rows with matched items, only 465 records remain.
Is there a way to find and to ‘connect’ these multiple (2nd, 3rd etc.) values to the row it belongs to?

ostephens · March 27, 2023, 1:54pm

Hi @Annika_Hendriksen I’d generally expect the right rows to be grouped together in when Records mode. Could you share a screenshot or some sample data for us to see what the issue is?

Annika_Hendriksen · March 27, 2023, 2:31pm

Hi@ostephens
hereby a screenshot of a few rows

Given name f.e. would result in multiple values im a lot of cases, and I do see 2nd and 3rd names in my export but ‘floating’ in otherwise empty rows.

Annika_Hendriksen · March 27, 2023, 2:32pm

And hereby the screenshot of records

of the same subset of 18.103 matched rows

Annika_Hendriksen · March 27, 2023, 3:11pm

by the way: I did start with 80.596 rows and somewhere in the process I got extra rows up to 92.353 at this moment, but where and when exactly I cannot tell. (before or after Add column from reconciled values)

ostephens · March 27, 2023, 4:04pm

What’s the filter you have applied to get from 92353 rows to just 18103?

Also what is the left most column you have in your grid? The Record grouping depends on how the first column(s) in the project look. To get the Record grouping working correctly you need to have a first column that contains a value for the first row in the record, and then blank in the other rows in the record

ostephens · March 27, 2023, 4:06pm

by the way: I did start with 80.596 rows and somewhere in the process I got extra rows up to 92.353 at this moment, but where and when exactly I cannot tell. (before or after Add column from reconciled values)

You should be able to tell just by going back step by step in the history until the row count changes.

But the change is very likely to be with the "Add column from reconciled values" as I'd expect that to add rows in cases there are multiple values for a single property for a single row.

Annika_Hendriksen · March 27, 2023, 5:04pm

The filter is the records that I have matched with a Wikidata item in Column Totalmatches.

I did a lot of matches by hand, checking the suggestions, so going back would take 1800+ steps.

How does the blank in works?

I see the numbering of the rows is different from the ID (although the ID is not perse 1,2,3 etc.) here, from row 31 on:

I guess too, the number of rows is expanded by “Add columns from reconciled values”, but I am not sure.

ostephens · March 27, 2023, 5:08pm

Ah - so the issue is (or at least one issue is) that you have a sort applied. The Record logic is applied to the rows as numbered in the first column. So to get the grouping to work, either remove the sort OR use the “Reorder permanently” in the sort menu (depending on what the correct sort order is)

The documentation on Records Exploring data | OpenRefine and the documentation on Sorting Sort and view | OpenRefine maybe helpful

Annika_Hendriksen · March 27, 2023, 5:55pm

Thank you very much, though I am not aware there is a Sort running…
I did a sort once, but did undo that, as far as I know.
I’ll try to find out, but that will not be stored in the operation history I guess?
I’ll go back to the documentation, Thanks.

ostephens · March 27, 2023, 5:59pm

Apologies - I misread your screen shot - I think you are correct there is no sort applied. I’m just seeing the missing rows from the filter you have applied

ostephens · March 27, 2023, 6:03pm

It’s quite difficult to diagnose with these partial views of the project. Are you able to share a fuller view of the project, or even the full project itself?

If you have a filter applied, and this is removing rows that (for example) don’t have an entry in a particular column, then you’ll end up filtering out the rows with the additional values in (because typically they will only have a value in one column).

The more information you can share the easier it will be to see where the issue is occurring

Annika_Hendriksen · March 27, 2023, 6:18pm

I completely understand, hoewever, this is persons data, so sharing the whole project seems not right to do.
The only filter I use is judgment ‘matched’.
I now think of exporting these 18.103 rows (judgment ‘matched’) and create a new project and re-reconcile the Qids etc.
but then again: how to keep the multiple values together after ‘Add column from reconciled values’, by turning from rows to record? I’ll check the documentation beforehand. Thanks again:)

ostephens · March 27, 2023, 6:32pm

I understand - it’s not always possible to share data that’s ok!

I’d start by removing the filters - filtering on matched in rows mode will only find rows that have a matched Wikidata ID - and the new rows added won’t have those. Once you’ve removed the filter then you can make sure that in Records mode you get the right rows grouped together.

In records mode you should see multiple rows grouped in the same colour with a single number for the record. Hopefully you can see this in this screenshot

Annika_Hendriksen · March 27, 2023, 6:37pm

Thanks, and I understand how it suppose to look like. (like in the screenshot) but unfortunately also without any filter the total amount of records is only 1.176 while having 92.353 rows so something went wrong, maybe even somewhere at the start.
Thanks for your help so far!

Annika_Hendriksen · March 28, 2023, 12:06pm

Although there was no sort in any column, by removing one specific column the problem is ‘solved’ and I managed to get my original amount of rows before Adding columns from reconciled values now as the amount of records
Now find out whatever was in that column that resulted in this drastically reduce of records face_with_raised_eyebrow:

Annika_Hendriksen · March 28, 2023, 3:06pm

Another curiosity arose when I removed columns that I earlier created after reconciliation by Adding columns from reconciled values the amount of records increased (while the amount of rows kept the same) How can one explain this?

ostephens · March 28, 2023, 3:46pm

Records are formed by grouping rows together, and rows are grouped based on the content of the first column - so it’s definitely possible for the same set of rows to end up being a different number of records based on the content of the first column. Here is an example where the same rows change from 2 to 3 records based on the column order

2 records:

3 records:

If you are removing a column at the start of the project, then I think it’s not surprising to see the record count go up (or down - either is just as likely). If you are adding/removing a column somewhere in the middle of the project it’s harder to understand what’s happening although my guess is that it’s not impossible this can also lead to extra (or fewer) records in some cases.

I also find that records are not always re-calculated after changes - so I have seen some situations where I’ve had to use the ‘blank down’ option on the first column to force the records to be recalculated after various changes

tfmorris · March 28, 2023, 4:20pm

As Owen said, a complete problem description with examples would make
it much easier to help you. None of your examples show multiple given
name matches, but I suspect this represents the crux of the problem:

Given name f.e. would result in multiple values im a lot of cases, and I do see 2nd and 3rd names in my export but ‘floating’ in otherwise empty rows

Multiple given names returned by your data extension operation will
result in rows which are blank, except for the given name column.
OpenRefine knows how to group these together with the original row
into a "record" comprised of multiple rows, but there's no such
concept in traditional spreadsheet formats like CSV or Excel.

The two traditional solutions are either 1) replicate data in the
other columns using Fill Down or 2) concatenate the multiple values
together into a single cell using Join Multi-valued Cells. It looks
like you have too many columns to make the first option easy, so I'd
suggest the second.

Tom

Annika_Hendriksen · March 29, 2023, 7:45am

Thanks for your suggestions!
indeed I have a lot of columns, to compare the Wikidata with our local data f.e.
I guess in this case the Join Multi-valued Cells is a better option indeed.

The manual states in https://openrefine.org/docs/manual/reconciling#add-columns-from-reconciled-values " This process may pull more than one property per row in your data (such as multiple occupations), so you may need to switch into records mode after you’ve added columns."

I guess this means that earlier in the process one should make a permanent sorting before switching to records mode? (in order to create this unique records key and keep all the Added values in the right records?)
https://openrefine.org/docs/manual/reconciling#add-columns-from-reconciled-values

Topic		Replies	Views
Reconciliation API / data extension Data cleaning and transformations	1	306	April 3, 2023
Multiple values in one cell during reconcile Support and Helpdesk	7	138	June 30, 2024
Reconcile many values into single entity Support and Helpdesk wikidata , reconciliation	1	150	December 23, 2023
Joining Multi Value Cells Not Completing Data cleaning and transformations	6	415	November 7, 2023
Transpose columns to rows Support and Helpdesk wikidata , reconciliation	2	343	December 14, 2022

How to find / group more than one property (values) after 'Add column from reconciled values'?

Related topics