GREL for exact word matching between columns

Sandra · November 8, 2023, 1:38pm

For a client I'm trying to find a GREL expression that helps me create a column with True/False values, depending whether a keyword occurring in column 1 exactly matches a full separate word in a longer text string in column 2. I don't know GREL or regex, and usually piece things together from examples, but can't figure this one out.

For an easier case (just matching the string), I have figured out (creating new column based on the existing keywords column): if(cells["Beschrijving"].value.find(value).length() > 0, "True", "False")

But I'd like to obtain the following results:

Trefwoord	Beschrijving	Match?
tand	Scherpe tanden	False
dirigent	Eerst was ze dirigent, daarna chef-dirigent	True
rivier	De Congo-rivier	True
papyrus	De professor onderzocht het papyrus, en moest toen onbedaarlijk niezen	True
school	School en werk	True

What GREL to use to create the Match? column for the examples above?

Help is much appreciated. Thank you!

ostephens · November 8, 2023, 3:37pm

It feels like there should be a simpler way of doing this, but the challenge is avoiding matches on substrings from words in the Beschrijving column (i.e. avoid tand positively matching tanden). There are probably a few ways of working around this but I'm going with:

filter(cells.Beschrijving.value.split(/\W/),v,v==value).length()>0

The work is being done here using the regular expression /\W/ which matches 'non-word' characters - so essentially finds a list of words in the string treating spaces and punctuation as the start/end of a word. You could also use \b (word boundary characters) here instead

filter(cells.Beschrijving.value.split(/\b/),v,v==value).length()>0

For the examples you've shared these should get the same outcome, but as I understand it there are some differences between exactly what characters \W and \b match to (and the latter might perform better for some unicode characters) e.g.
"pä".split(/\b/) -> [ "pä" ]
"pä".split(/\W/) -> [ "p" ]

It looks from the school row you want the match to ignore case. So you could do:

filter(cells.Beschrijving.value.split(/\b/),v,v.toLowercase()==value.toLowercase()).length()>0

or possibly extending this a little further if you wanted to avoid accented characters:

filter(cells.Beschrijving.value.split(/\b/),v,v.fingerprint()==value.fingerprint()).length()>0

As an aside because you are just looking for a true/false outcome you don't need to actually do the if test, you can just have the expression and it will output a Boolean true/false that will be stored in the cell. If you want to store it as a string instead of a boolean (although I'm not sure there is any advantage to this) you can use

(filter(cells.Beschrijving.value.split(/\b/),v,v.fingerprint()==value.fingerprint()).length()>0).toString()

Topic		Replies	Views
Combining the if and contains GREL functions to add a new column based on existing columns Support and Helpdesk	4	180	January 17, 2025
Is there a GREL function to alphabetically compare strings? Support and Helpdesk	14	262	February 3, 2024
Creating a conditional (if-then) regex Support and Helpdesk	4	557	June 12, 2023
VBQ (Very Basic Question): Add a column based on this column (but only with part of the string?) Support and Helpdesk hints-and-tips	2	311	March 16, 2023
Help me with the logic on this one. == instead of && in Expression Support and Helpdesk	3	33	July 26, 2024

GREL for exact word matching between columns

Related topics