I have a dataset that I was not able to scrape in a way that it would have been more structured. This resulted in blocks of text, where I can identify patterns for finding the right data. Creating the right regex statement has been challenging.
In the middle of the text block, there could be a line:
"Année d’inclusion à l’inventaire : 2021"
Which regex statement to use to detect this. I have tried
value.match(/.*Année d'inclusion à l'inventaire(.+)$/)[0]
value.find(/.*Année d'inclusion à l'inventaire(.+)$/)[0]
value.match(/.*|\nAnnée d'inclusion à l'inventaire(.+)$/)[0]
and some other variations
With match in OpenRefine you have to define a regular expression that matches the complete content. Whereas with find in OpenRefine you can define a regular expression that matches only on a substring of the content.
As far as I understand you are searching for a line with a variable part in a multi line text statement.
Usually you could use some "modifiers" to tell the regular expression engine to interpret $ as the end of a line instead of the end of the whole text.
AFAIK this is not supported with find in OpenRefine, but we can either use a more detailed regular expression or the linebreak \n instead.
More detailed version:
value.find(/Année d.inclusion à l.inventaire : \d+/)[0]
Using linebreak:
value.find(/Année d.inclusion à l.inventaire[^\n]+/)
Note: The OpenRefine forum encodes ' differently than (my) OpenRefine. So I replaced them with the universal . in the regular expressions.