Dear All,
I realized that were some splitted word in the text of my cells, p.e.:
"Fondare un nuovo mini stero della Cultura e della creatività (uno dei tanti nomi possibili} non sa rà, nell'immediato futuro, solo un optional per l'Italia:"
So I escaped the value using value.escape("javascript")
"Fondare un nuovo mini\u00AD stero della Cultura e della creativit\u00E0 (uno dei tanti nomi possibili} non sa\u00AD r\u00E0, nell'immediato futuro, solo un optional per l'Italia:"
Shall I proceed with value.replace("\u00AD", "") and how can I go back to the unescaped text?
Also I'm not sure why I have this issue in the first place.
I am quite new to data cleaning and OR.
Any insights would be very helpful and much appreciated.
Thank you very much.
Kind regards,
Fede
Unescape is the inverse of escape, but OpenRefine also has infinite undo, so you can also just undo the last operation (assuming you haven't done any other operations that you'd like to keep).
The character \u00E0 is a soft hyphen, but it appears to be followed by a space as well, so you probably want to remove both characters.
Out of curiosity, what tool/editor produced the text with the embedded soft hyphens?
Tom
Hello Tom,
The original file is from an SQL table, downloaded as CSV and finally uploaded to OR.
The SQL table collects a large number of articles from a website that is now offline.
I am not aware of the tools used at the time to write the articles or the methods used to make the SQL file (I am gonna ask). The website closed many years ago.
I tried this, as you said, value.replace(/\u00AD\s/,"") and it looks like it worked to fix the problem.
Thank you very much for the help.
Fede