Hi Lan,
When you use the split multi-value cells
operations you create something that OpenRefine calls a record.
You can use this structure when exporting the data in a format that supports nesting, such as JSON or XML.
In case of exporting the data in a tabular format like CSV and using it in programs that do not support the records concept from OpenRefine, you have to somehow dumb the data down.
The operation that OpenRefine offers you for this is Fill Down, which will take a value and repeat it into all following empty rows (simplified).
Your second scenario (column splitting) is somehow interesting regarding the type of information.
So New York City
is a city and NY
is a state and you might want to have one column with the name of the city and one for the name of the state. For this OpenRefine offers you regular expressions for either column splitting or adding a new column based on the current column.
Most of the time this is enough to cleanly separate most of the data with some manual adjustments for the rest.
If you have some more elaborate text, it is quite easy to call an ML enabled API for Named-entity-recognition from OpenRefine. See for example Using the OpenAI API to apply natural language queries to cells/data.
So this does not answer your specific questions, as these are more in the line of “How will OpenRefine evolve in the future”. I am just trying to put your problem description in the context of how I would tackle them in OpenRefine.
Cheers Benjamin