Question about value.replace problem

Hello,

I'm stuck on something. I've tried to run the following GREL transformation across all columns in my table (i.e. I choose the "Transform" option under the "All" column):

value.replace("Chatman, Robby F.", "Chatman, Robert F.")

When I run it, the process only converts some the cell values that are Chatman, Robby F. When I use a conventional replace on a single column (i.e Edit cells>Replace), all values are successfully replaced.

I've found that sometimes I run into issues with find and replace because the spaces in between the strings differ from one string to the other. Namely, sometimes the space is a nonbreaking space, and sometimes they're conventional "space bar" spaces. So If I replace the first with the second, replacing works. However, in this case, I've double checked and all of the strings Chatman, Robby F. have conventional spaces, so the transformation should work.

I've never used the transform function across all columns so maybe I am missing something? My table is about 56,000 rows of data and 20-plus columns--perhaps the transform process times out at a certain limit?

Thanks for any insights you can provide!

You and add the following GREL value.replace(/[\p{Zs}\s]+/,' ') to collapse all consecutive whitespace before applying your transformation.

You can chain your two replaces in a single expression this way:

value.replace(/[\p{Zs}\s]+/,' ').replace("Chatman, Robby F.", "Chatman, Robert F.")

Using the All > Transform works the same as running the expression on each column individually. Under the Undo/Redo tab, you can confirm how many replace on each column were performed, and confirm if it missed a column or not.

Thanks, Martin. I removed any extra whitespace before I ran the transformation. And running it on an individual column, it changes some of the values but not all (same as the global transformation). Furthermore, find and replace works fine on these values in Excel! I'm assuming it's some hidden character/encoding issue that I can't figure out and which Excel forgives but OpenRefine doesn't. So I just wrote a Powershell script to find and replace on multiple values. Anyway, appreciate your response. It remains a mystery but at least I've figured out a workaround.

@js1980 Feel free to :+1: on this issue Add a new Hidden Chars facet · Issue #5207 · OpenRefine/OpenRefine · GitHub
if you think it would help you, and add any comments for the Facet design.

1 Like

Facets are a good tool to help debug things like this. If you set up a facet with your regex, you can then examine the non-matching values to see what they have in common.

Another useful tool, particularly if you suspect character encoding issues (e.g. composed vs decomposed Unicode sequences), are Unicode strings inspectors/debuggers, such as:
https://apps.timwhitlock.info/unicode/inspect?s=Chatman%2C+Robby+F.

Be sure to cut/paste your strings from OpenRefine's cell edit window to make sure that they are unchanged.

Lastly, if it is a Unicode issue, prefixing your regex with the Unicode flag (?U?), as I mention here might help.

If you figure out what the issue was, we'd be curious to know.

Tom