Transpose values into new columns that indicate the existence of these values

I'm using OpenRefine 3.8.2 on Linux. What I'm trying to achieve is this:

Before:

rows Column
1 value1;value2;value3;value4
2 value4;value2;value5
3 value4

In each row, the values are in no specific order (cf row 2 in the example).

After:

rows value1 value2 value3 value4 value 5
1 x x x x
2 x x x
3 x

Of course, instead of "x / empty" it could also be boolean values or "1/0" or any equivalent indication. I'd appreciate any suggestions and help!

This is called One Hot Encoding and yes you can do that with OpenRefine, but not in a single step. I gave an elaborate answer on this question in Can OpenRefine easily do One Hot Encoding? - Stack Overflow.

The main question is, whether you already know the whole set of possible values, or whether you need to determine them automatically.

If you know the set of values beforehand, you can use the following GREL expression:

with(
    value.split(/\s*;\s*/),
    cell_categories,
    forEach(
        ["value1", "value2", "value3", "value4", "value5"],
        category,
        if(cell_categories.inArray(category), 1, 0)))
.join(";")

The algorithm is explained in more detail in my StackOverflow answer.

Once you have the One Hot Encoding in a single column, you can split the column into several columns.

Otherwise you first have to simulate a record to be to able access all the values in the value column. This is described in the second recipe in my StackOverflow answer.

This was exactly the answer I was hoping for. Thanks a lot.

1 Like