Posting this here because I think this is an area where AI (around which I’m very cautious) can potentially be of great help to OpenRefine users.
One of the harder things for me (and the people I teach OpenRefine): finding appropriate pieces of scripts and expressions to do the things I need to do. Yes, you can search the web, and it may be in the documentation, but all that looking up is time consuming, can be frustrating if your exact case is not described elsewhere, the documentation you find may be difficult to understand (if you are not a coder), and the process of looking up things takes you out of the flow you were in during the task at hand.
I don’t master regex (I kind of stubbornly refuse to learn it, frankly, because most people I teach/help also don’t know it and their priorities are elsewhere). I’ve been using https://www.autoregex.xyz/ occasionally as this transforms natural language to regex, but this service is spotty.
I have just tried ChatGPT to do the same things for me. I don’t have a concrete project at the moment with which I can test if it works, but I tried the following and I’ll leave it up to the experts to check if the output is correct
Me: Give me a correct regular expression for “the last comma in this series of characters”
ChatGPT: This regular expression matches the last comma in a string of characters:
/,(?!.*,)/
The (?!.*,)
part is a negative lookahead that asserts that there is no other comma ahead in the string.
Screenshot:
Me: Give me a correct regular expression to retrieve a series of four numbers from a string of characters. The numbers are between rounded brackets, starting with the number 1
ChatGPT: This regular expression matches a series of four numbers between rounded brackets, starting with the number 1:
/\(1[0-9]{3}\)/
And here comes the banger…
Me: Create a piece of GREL code for OpenRefine in which I shorten the values in a column by removing the first 7 characters in each cell.
ChatGPT: Here is the GREL (Google Refine Expression Language) code to remove the first 7 characters from each cell in a column in OpenRefine:
value.substring(7)
You can use this expression in OpenRefine by selecting the column you want to modify, clicking on the “Edit cells” button, and selecting “Transform” from the drop-down menu. Then, you can paste the expression into the “Expression” field and click on “OK”. The changes will be applied to all cells in the column, removing the first 7 characters from each.
Screenshot:
I’ll try this kind of stuff ‘for real’ in a next project. For now, just putting this here