Two ways of feature extraction: 1. split multi-value cells; 2. split into multiple columns

b2m · March 23, 2023, 6:49pm

Hi Lan,

When you use the split multi-value cells operations you create something that OpenRefine calls a record.

You can use this structure when exporting the data in a format that supports nesting, such as JSON or XML.

In case of exporting the data in a tabular format like CSV and using it in programs that do not support the records concept from OpenRefine, you have to somehow dumb the data down.

The operation that OpenRefine offers you for this is Fill Down, which will take a value and repeat it into all following empty rows (simplified).

Your second scenario (column splitting) is somehow interesting regarding the type of information.

So New York City is a city and NY is a state and you might want to have one column with the name of the city and one for the name of the state. For this OpenRefine offers you regular expressions for either column splitting or adding a new column based on the current column.

Most of the time this is enough to cleanly separate most of the data with some manual adjustments for the rest.

If you have some more elaborate text, it is quite easy to call an ML enabled API for Named-entity-recognition from OpenRefine. See for example Using the OpenAI API to apply natural language queries to cells/data.

So this does not answer your specific questions, as these are more in the line of “How will OpenRefine evolve in the future”. I am just trying to put your problem description in the context of how I would tackle them in OpenRefine.

Cheers Benjamin

Topic		Replies	Views
"split multi-valued cells" seems to affect non-matching rows Support and Helpdesk	4	376	July 19, 2023
Transpose values into new columns that indicate the existence of these values Data cleaning and transformations	2	21	February 12, 2025
Could Someone Give me Advice with Complex Data Transformation in OpenRefine? Support and Helpdesk	3	53	July 26, 2024
Reconciliation API / data extension Data cleaning and transformations	1	299	April 3, 2023
Joining Multi Value Cells Not Completing Data cleaning and transformations	6	372	November 7, 2023

Two ways of feature extraction: 1. split multi-value cells; 2. split into multiple columns

Related topics