Help extracting data from a long field column of real estate data

I have a dataset of real estate data with a lot of columns, but one column in particular contains a kitchen sink's worth of variables. The format is like this:

Amenities|1st Floor Bedroom|Yes;Amenities|Laundry - 1st Floor|Yes;Amenities|Home Warranty|Yes;Bedroom 1|Bedroom 1|Yes;Bedroom 1|Bedroom 1 Length|9.1;Bedroom 1|Bedroom 1 Width|12.8;Bedroom 1|Bedroom 1 Level|First;Condominium|Unit Type|Condo;Condominium|Units in Building|1;

It's an odd way of organizing the data, as there seems to be a key category, a key, and a value. For example, "Amenities" is a key category containing "1st Floor Bedroom," Laundry—1st Floor," and "Home Warranty," each with its own value of different lengths separated by '|'.

The ';' is the separator, but a key-value pair is skipped if the record does not contain information on a certain variable name. Therefore, if I use ';' to separate into columns, the columns do not contain consistent data.

For example, one record starts with

Amenities|1st Floor Bedroom|Yes;Amenities|Laundry - 1st Floor|Yes;Amenities|Home Warranty|Yes;

The next row starts with

Amenities|Laundry - Hookup|Yes;Amenities|Pantry|Yes;

The third row starts with

Amenities|Furniture Included|Yes;Appliances Included|Microwave|Yes;

Do you have any thoughts on the best way to separate these into columns where the columns have consistent values?

You should be able to use Transpose -> Columnize by Key/Value to do this. A quick sketch of the process:

  • Split multi-valued cells using semicolon (;) as a separator
  • Optionally, Fill Down to replicate a record ID or other leading column
  • Add Value column using the expression value.split('|')[-1]
  • Remove value from key column by splitting and then rejoining all except the last piece value.split('|')[0,-1].join('/')
    [I arbitrarily changed the separator too, but you can reuse the same separator]
  • Transpose -> Columnize by Key/Value

This should give you a column heading for each unique key with all the values underneath.

Hope that helps! Let us know how you make out.

Tom

1 Like

This seems perfect; thank you!