I have a dataset of real estate data with a lot of columns, but one column in particular contains a kitchen sink's worth of variables. The format is like this:
Amenities|1st Floor Bedroom|Yes;Amenities|Laundry - 1st Floor|Yes;Amenities|Home Warranty|Yes;Bedroom 1|Bedroom 1|Yes;Bedroom 1|Bedroom 1 Length|9.1;Bedroom 1|Bedroom 1 Width|12.8;Bedroom 1|Bedroom 1 Level|First;Condominium|Unit Type|Condo;Condominium|Units in Building|1;
It's an odd way of organizing the data, as there seems to be a key category, a key, and a value. For example, "Amenities" is a key category containing "1st Floor Bedroom," Laundry—1st Floor," and "Home Warranty," each with its own value of different lengths separated by '|'.
The ';' is the separator, but a key-value pair is skipped if the record does not contain information on a certain variable name. Therefore, if I use ';' to separate into columns, the columns do not contain consistent data.
For example, one record starts with
Amenities|1st Floor Bedroom|Yes;Amenities|Laundry - 1st Floor|Yes;Amenities|Home Warranty|Yes;
The next row starts with
Amenities|Laundry - Hookup|Yes;Amenities|Pantry|Yes;
The third row starts with
Amenities|Furniture Included|Yes;Appliances Included|Microwave|Yes;
Do you have any thoughts on the best way to separate these into columns where the columns have consistent values?