Issue Columnizing by Key/Value Columns with Data in other Columns

ellathompson · July 15, 2025, 7:22pm

Hello! I am working in version 3.9.3 and wishing for improved functionality with the "Columnizing by Key/Value Columns" option. I have a large and complex dataset, so I will simplify it for illustrative purposes. I am working with data that looks like this:

ID	NAME	Name Source	Place Fields	Place Values
1	Haymarket Theatre (theater in London, England, built in 1720)	local	place_id	1264
	Haymarket Theatre (London, England)	LCNAF	place_name	London
			place_type	Inhabited place
			place_tgn	7011781

I want the result to look like this:

ID	NAME	Name Source	place_id	place_name	place_type	place_tgn
1	Haymarket Theatre (theater in London, England, built in 1720)	local	1264	London	Inhabited place	7011781
	Haymarket Theatre (London, England)	LCNAF

However, the actual result looks like this:

ID	NAME	Name Source	place_id	place_name	place_type	place_tgn
1	Haymarket Theatre (theater in London, England, built in 1720)	local	1264
	Haymarket Theatre (London, England)	LCNAF		London
					Inhabited place	7011781

From what I understand, this because in the original dataset, the values in the columns other than Place Fields and Place Values in the first two rows are distinct, whereas in the second two rows they are empty, and therefore identical. Is there a way around this? Thanks so much for the help!

b2m · July 16, 2025, 6:54am

Hi

thank you for the explicit example, which really helps to quickly identify the problem.

What you describe is a common problem when using "Columnizing by Key/Value Columns".
Maybe we should make this more explicit in the docs Transposing | OpenRefine.

The "problem" here is, that OpenRefine is using the NAME column as ids for the transposing.
As some cells in the NAME columns are empty, they are organized with "empty" as key.
Meaning they are in a separate row and not together with their original record.

The solution for this is to use Fill down on the columns with empty cells that are not columnized (NAME and Name Source in your example).

Best
Benjamin

ellathompson · July 16, 2025, 4:03pm

Thanks so much for the response. This helps reduce the number of extra rows, but unfortunately, still does not resolve the core of the problem. The example data which I provided was one Record out of 856 in the project. Each of these Records corresponds to a different Venue, each of which can have any number of alternative names, As such, the two alternative names in the Record I provided, "Haymarket Theatre (theater in London, England, built in 1720)" and "Haymarket Theatre (London, England)" both apply to the same Venue, as does all of the information in the Place Fields and Place Values columns. When I fill down, it eliminates one extraneous row from each record, but the columnized data is still split amongst the two rows of the Record, as the values in the first two rows differ in the other columns of the record.. I imagine that the only way to fix this would be to somehow disregard the columns in the project that don't contains Fields or Values while columnizing, which I assume is not possible, so this might be a fruitless effort, but I thought I would ask if there was a more clever workaround than I have been able to think of, or perhaps if this is something that might change in a future release of OpenRefine. Thanks again for all of your help!

ostephens · July 17, 2025, 9:30am

Hi @ellathompson,

I think you can do the transpose step as currently and then on each of the columns created by the transpose use Edit cells -> Join multi-valued cells - this will bring the value in the column into the first cell of the record (the record being defined by the ID column)

This option becomes cumbersome if you have many new columns created by the transpose, as you have to do the "join" step for each column. But if it is a handful or so then this won't take long.

If this doesn't work, I have a potential alternative approach I'm working on, but its looking a bit complicated, so in the first place I'd suggest the "join multi-valued cells" approach. However if that doesn't work let me know what the issues are and I can see if my other more complicated approach might be applied

ellathompson · July 17, 2025, 1:51pm

Hi!
This worked for me, thanks so much! I had been hesitant to use this approach because I had several Records in the project that had multiple Places associated with the Venue, meaning that the desired result would have more than one row of values in the key columns. However, it occurred to me that after joining multi-valued cells, I could easily split them again, and that worked great. It would still be nice to have a way to disable the use of other columns as keys when trying to columnizing in this way, but this works perfectly well for now. Thanks again for all of your help!

Topic		Replies	Views
Transpose columns to rows Support and Helpdesk wikidata , reconciliation	2	371	December 14, 2022
Help extracting data from a long field column of real estate data Data cleaning and transformations	2	82	June 18, 2024
Transpose value that has binary into rows Data cleaning and transformations	1	287	January 30, 2023
Transpose values from all rows of a certain column into one cell for a new column Data cleaning and transformations	9	415	November 9, 2023
Duplicating rows problem Support and Helpdesk	9	172	June 11, 2025

Issue Columnizing by Key/Value Columns with Data in other Columns

Related topics