I'm using OpenRefine 3.8.2 on Linux. What I'm trying to achieve is this:
Before:
rows
Column
1
value1;value2;value3;value4
2
value4;value2;value5
3
value4
In each row, the values are in no specific order (cf row 2 in the example).
After :
rows
value1
value2
value3
value4
value 5
1
x
x
x
x
2
x
x
x
3
x
Of course, instead of "x / empty" it could also be boolean values or "1/0" or any equivalent indication. I'd appreciate any suggestions and help!
b2m
February 12, 2025, 7:30am
2
This is called One Hot Encoding and yes you can do that with OpenRefine, but not in a single step. I gave an elaborate answer on this question in Can OpenRefine easily do One Hot Encoding? - Stack Overflow .
The main question is, whether you already know the whole set of possible values, or whether you need to determine them automatically.
If you know the set of values beforehand, you can use the following GREL expression:
with(
value.split(/\s*;\s*/),
cell_categories,
forEach(
["value1", "value2", "value3", "value4", "value5"],
category,
if(cell_categories.inArray(category), 1, 0)))
.join(";")
The algorithm is explained in more detail in my StackOverflow answer.
Once you have the One Hot Encoding in a single column, you can split the column into several columns .
Otherwise you first have to simulate a record to be to able access all the values in the value column. This is described in the second recipe in my StackOverflow answer.
This was exactly the answer I was hoping for. Thanks a lot.
1 Like