I need it to take the result of a regular expression and pipe it into the toUppercase function. I'm new to GREL and can't figure it out. Here's the use case.
Using Common Transformations > To titlecase to clean up mixed-case names works well except on hyphenated last names which get transformed to something like this
John-doe
I created a regex expression to find the character after the hyphen but I can't figure how how to use the GREL replace() function to transform the result of the regex into an uppercase letter. The function
value.replace(/(?<=[-])./,"X")
works as expected and transforms John-doe into John-Xoe
But when I try something like this
value.replace(/(?<=[-])./, toUppercase())
the GREL function toUppercase doesn't know what result to transform. It's expecting a string. How is this done?
Welcome Greg @goyosoyo !
I'll update our docs, because it looks like we're still missing the delimiter argument that you can pass into toTitlecase(string, delimiter)
. Sorry about that!
Try:
value.toTitlecase(" -")
which will capitalize letters appearing after spaces and after a hyphen.
Add any more delimiter chars as necessary, and you might be careful and want to do this in combination with a Facet that narrows or filters to only rows where your cells values contain the delimiters, such as each one in the set of -'_+
perhaps depending on what's in your cells values. Update a Text Filter or Facet and transform cells with toTitlecase()
with the necessary individual delimiter. Rinse and repeat until you've covered the set of delimiters for your cells values that you want values to be converted to titlecase.
Thanks for the warm welcome. A cleaner, simpler solution is always welcome. Thank you. Greg
Hello.
My situation is very similar to @goyosoyo
I've already done dozens of searches in the OpenRefine documentation, on Google, etc.
On ChatGPT I made about 15 prompts and nothing.
I deal reasonably with Regex, but in OpenRefine I can't change the case of a string.
I need to change the text
Tribunal Regional do Trabalho - Trt
to
Tribunal Regional do Trabalho - TRT
Could anyone give some direction for this?
Thanks
Hi,
I think it should work with:
value.replace("Tribunal Regional do Trabalho - Trt","Tribunal Regional do Trabalho - TRT")
Best!
I guess there are multiple cases you want to handle, ChatGPT advised me for making the last "word" in uppercase with:
value.split(" ").slice(0, -1).join(" ") + " " + value.split(" ").slice(-1)[0].toUpperCase()
When you are trying to solve this type of problem you need to give some thought to what the general patterns are across your complete dataset - because the challenge is usually not changing a single value, but changing all the similar variations in your dataset.
If it is really just this one phrase repeated across your dataset then it could be done by creating a text facet (Exploring facets | OpenRefine) and doing the edit in the facet (editing a facet value changes that value for all cells in the column), or by the suggestion from @HannaMeiners
However, if you have different variations of values where the pattern is:
<full name> - <abbreviation>
and you need the to always be in uppercase, then an approach like @Nico_Altink suggests would be better. The expression Nico suggests from ChatGPT is OK, but there are two issues with it:
- a minor issue: there is a typo - it needs to be
toUppercase()
not toUpperCase()
GREL functions | OpenRefine
- a bigger issue: it is a very general expression which always converts the last word to uppercase which could result in making changes where they aren't needed
Assuming the pattern is
<full name> - <abbreviation>
I'd suggest
value.split(" - ")[0]+" - "+value.split(" - ")[1].toUppercase()
or to avoid a little repetition you can use the with()
control
with(value.split(" - "),a,a[0]+" - "+a[1].toUppercase())
Thanks for correcting ChatGPT, as indeed I forgot to test the expression.
However being a bit picky I can also discuss your " - " split, as it will fail when there's no "-" separator, thus my choice to use the last space as separator of a "word" was regarded more robuste.
I do have a similar function coded in VBA that's based on showing just the capitals in the selected text.
In OpenRefine this could be done by first splitting the column text by your " - " separator and then extract the capitals from the intended text. This does give for free an additional test for description without the "-" separator.
I agree - if your requirement is to capitalise the last word then the split function needs to match this purpose and the option you give would work here.
My point of it being general is if (for example) you had a mixture of values like
Tribunal Regional do Trabalho - Trt
Tribunal Regional do Trabalho
then in that case you don't want to capitalise the last word for the second line but do something else. This is what I meant by saying the ChatGPT solution here
is a very general expression which always converts the last word to uppercase which could result in making changes where they aren't needed
But there are many approaches and many variations you can take here. This is why I say first:
When you are trying to solve this type of problem you need to give some thought to what the general patterns are across your complete dataset - because the challenge is usually not changing a single value, but changing all the similar variations in your dataset.
Without seeing the whole data set it is not really possible to say which approach is the right one here - this is something @DimasJusto will need to work out I think
Hi HannaMeiners.
I ended up forgetting to write that this is just an example. I have around 1,350 occurrences with different texts.
I resolved the situation by exporting to .tsv and dealing with regex in Notepad ++.
Thank you for your contribution.
Best.
Hi @DimasJusto. Just in case you hadn’t realised you can use regular expressions in OpenRefine (in the replace and other functions)
It would be interesting to know why you decided it was easier to do outside OpenRefine in the end as we’re always interested in how we can improve OpenRefine