Help extracting data from a long field column of real estate data

tev · June 17, 2024, 12:31pm

I have a dataset of real estate data with a lot of columns, but one column in particular contains a kitchen sink's worth of variables. The format is like this:

Amenities|1st Floor Bedroom|Yes;Amenities|Laundry - 1st Floor|Yes;Amenities|Home Warranty|Yes;Bedroom 1|Bedroom 1|Yes;Bedroom 1|Bedroom 1 Length|9.1;Bedroom 1|Bedroom 1 Width|12.8;Bedroom 1|Bedroom 1 Level|First;Condominium|Unit Type|Condo;Condominium|Units in Building|1;

It's an odd way of organizing the data, as there seems to be a key category, a key, and a value. For example, "Amenities" is a key category containing "1st Floor Bedroom," Laundry—1st Floor," and "Home Warranty," each with its own value of different lengths separated by '|'.

The ';' is the separator, but a key-value pair is skipped if the record does not contain information on a certain variable name. Therefore, if I use ';' to separate into columns, the columns do not contain consistent data.

For example, one record starts with

Amenities|1st Floor Bedroom|Yes;Amenities|Laundry - 1st Floor|Yes;Amenities|Home Warranty|Yes;

The next row starts with

Amenities|Laundry - Hookup|Yes;Amenities|Pantry|Yes;

The third row starts with

Amenities|Furniture Included|Yes;Appliances Included|Microwave|Yes;

Do you have any thoughts on the best way to separate these into columns where the columns have consistent values?

tfmorris · June 17, 2024, 8:20pm

You should be able to use Transpose -> Columnize by Key/Value to do this. A quick sketch of the process:

Split multi-valued cells using semicolon (;) as a separator
Optionally, Fill Down to replicate a record ID or other leading column
Add Value column using the expression value.split('|')[-1]
Remove value from key column by splitting and then rejoining all except the last piece value.split('|')[0,-1].join('/')
[I arbitrarily changed the separator too, but you can reuse the same separator]
Transpose -> Columnize by Key/Value

This should give you a column heading for each unique key with all the values underneath.

Hope that helps! Let us know how you make out.

Tom

tev · June 18, 2024, 9:56am

This seems perfect; thank you!

Topic		Replies	Views
Two ways of feature extraction: 1. split multi-value cells; 2. split into multiple columns Data cleaning and transformations	6	1233	March 24, 2023
Extract data from one column and move to a new column Data cleaning and transformations	10	1115	November 24, 2023
Split Column in reverse order Support and Helpdesk	1	33	September 6, 2024
Extract all unique data from a column Support and Helpdesk	4	285	March 5, 2025
Transpose columns to rows Support and Helpdesk wikidata , reconciliation	2	341	December 14, 2022

Help extracting data from a long field column of real estate data

Related topics