Encoding problem in cell's text

Fede_Jukev · November 1, 2024, 2:09pm

Dear All,

I realized that were some splitted word in the text of my cells, p.e.:
"Fondare un nuovo mini stero della Cultura e della creatività (uno dei tanti nomi possibili} non sa rà, nell'immediato futuro, solo un optional per l'Italia:"

So I escaped the value using value.escape("javascript")

"Fondare un nuovo mini\u00AD stero della Cultura e della creativit\u00E0 (uno dei tanti nomi possibili} non sa\u00AD r\u00E0, nell'immediato futuro, solo un optional per l'Italia:"

Shall I proceed with value.replace("\u00AD", "") and how can I go back to the unescaped text?
Also I'm not sure why I have this issue in the first place.
I am quite new to data cleaning and OR.

Any insights would be very helpful and much appreciated.
Thank you very much.
Kind regards,
Fede

tfmorris · November 1, 2024, 5:54pm

Unescape is the inverse of escape, but OpenRefine also has infinite undo, so you can also just undo the last operation (assuming you haven't done any other operations that you'd like to keep).

The character \u00E0 is a soft hyphen, but it appears to be followed by a space as well, so you probably want to remove both characters.

Out of curiosity, what tool/editor produced the text with the embedded soft hyphens?

Tom

Fede_Jukev · November 1, 2024, 6:33pm

Hello Tom,

The original file is from an SQL table, downloaded as CSV and finally uploaded to OR.
The SQL table collects a large number of articles from a website that is now offline.
I am not aware of the tools used at the time to write the articles or the methods used to make the SQL file (I am gonna ask). The website closed many years ago.

I tried this, as you said, value.replace(/\u00AD\s/,"") and it looks like it worked to fix the problem.

Thank you very much for the help.

Fede

Topic		Replies	Views
Split multi-valued cells Data cleaning and transformations hints-and-tips	6	354	May 17, 2023
Question about value.replace problem Data cleaning and transformations	4	360	March 26, 2024
Anyone know a way to clean symbols that have replaced accents in datasets first names? Support and Helpdesk	6	369	May 7, 2024
Combining expressions Support and Helpdesk hints-and-tips	7	60	September 19, 2024
Trouble with SQL export Support and Helpdesk	11	345	December 11, 2024

Encoding problem in cell's text

Related topics