Trouble with SQL export

Hi. I'm getting strange errors when I import an SQL file (from a CSV file) made by OpenRefine into my MySQL database graphic interface - Sequel Ace. It used to work no problem, now I'm seeing:

[ERROR in query 1] You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '"_A" VARCHAR(255) NULL,

"N_N_N_N_N_N_N_N" VARCHAR(255) NULL

Is there something I'm supposed to be doing differently? I'm using Version 3.8.0 (TRUNK)

Hi and welcome!

Is it possible there's a special character (like an apostrophe) in one of your column names? That may be incorrectly interpreted by your SQL client. Is N_N_N_N_N_N_N_N the name of one of your columns?

If you're able to share a few lines from the faulty SQL file (in particular the CREATE statement) that could be helpful in diagnosing your issue.

Dear TimTom.

Thank you for your reply. You got it in one. Yes, when I open my CSV file in OpenRefine 3.8.0, it adds several special characters to the first column. Instead of just "A", it adds "ÿþA"

Yes, "NNNNNNNN", no spaces, is one of my columns.

My process is to download the CSV from a website, open it in OpenRefine, strip out a formatting line, then export it, either as a TSV or SQL (I've tried both over time, but yesterday nothing worked.) Then I import it into Sequel Ace, an open source SQL database management tool.

With my workflow, I don't see the CREATE statement, normally handled by the CSV import process in Sequel Ace unless it throws up an error. From an error I'm having right now, here is the CREATE statement:

Error adding new table:

An error occurred while trying to add the new table '20240503' by

CREATE TABLE 20240503 (
A VARCHAR(255),
NNNNNNNN VARCHAR(255),
NAME VARCHAR(255),
STATUS VARCHAR(255),
F VARCHAR(255),
MMMMMMMM VARCHAR(255),
NAME2 VARCHAR(255),
CCCCCCCC VARCHAR(255),
NAME3 VARCHAR(255),
STARTING_DATE VARCHAR(255),
ENDING_DATE VARCHAR(255),
REGISTRATION_DATE VARCHAR(255),
CLOSING_DATE VARCHAR(255)).

MySQL said: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '`N' at line 3

If you have any further tips, I'd love to hear them.

I can see a few things.

One is that the table name 20240503 may be an issue. As per the MySQL documentation: "Identifiers may begin with a digit but unless quoted may not consist solely of digits."

The problem is that the OpenRefine SQL exporter does not add quotes around table and column names (and I don't see an option to do so). So one thing you can do to fix this is to change the name of your table (which is the name of your OpenRefine project) so that it does not only contain digits.

If however you do need your table to be named 20240503, then you'll need to manually edit the SQL file generated by OpenRefine. It's a plain text file, so you can open it in e.g. TextEdit (if you're on a Mac) or Notepad (on Windows) and add back ticks on either side of the table name, e.g.

change CREATE TABLE 20240503 ( to CREATE TABLE `20240503` (

and try again.

The other thing that makes me pause are those special characters that you mention are added to the first column. This looks like an encoding issue: ÿþ is the UTF-16 Byte-order mark interpreted as Windows-1252 encoding.

When you first import your CSV into OpenRefine, try changing the value of the "Character encoding" field (just underneath the preview window) and set it to UTF-16 to see if it makes those characters disappear. That may solve some of your other issues as well, since working with incorrect encoding may add unexpected characters in unexpected places and potentially mess up your SQL statement.

Hope this helps.

Edit: I just noticed that the latest version of OpenRefine (3.8.0) now includes quotes around column names in the SQL exporter, but still not around the table name.

Really? you sure? I thought that was working correctly last I looked.

CREATE TABLE Clipboard (
"_____ABC" VARCHAR(255) NULL,
"2211" VARCHAR(255) NULL,
"_22ABC" VARCHAR(255) NULL
);
INSERT INTO Clipboard ("_____ABC","2211","_22ABC") VALUES 
( '111','222',null ),
( 'zzz','1212 ','1212' ),
( 'aaa','1212',null )

Yeah, seems like we missed quoting around the table name?
Please open an issue.

We went with double quotes " " because most database vendors use them as the default "quoted identifier" or "delimited identifier". (MySQL loves being different...always.)

1 Like

Yeah, sadly the double quotes in the table names wouldn't help in that particular case, since our friend mySQL expects backticks. It looks like offering the user the option to specify the escape character was previously discussed but discarded in favour of the double quote.

Hmm, we could add a toggle to the SQL Export dialog for -

Delimit names with:
Use "" double quotes
Use `` backticks

What do you think? If so, add that as a feature request.

1 Like

Hi Timtom,

Thanks for your reply and public-spiritedness. I had spent hours trying to figure this out and am glad to have access to your expertise.

I didn't know it would be an issue for your interpretation but I had edited the table name to remove easily google-able elements. The real table name is that date followed by four text characters, so that's not an issue.

Thank you for the suggestion to use TextEdit on a Mac to edit the exported SQL. That's what I had already been doing on the exported TSV/CSV, which is what I was using since SQL wasn't importing, either, but I didn't know I could do on the SQL file. I have just done this. You're teaching me new tricks.Thanks!

I had used TextEdit to open the original CSV and manually edit out the line that I used OpenRefine to remove, bypassing OpenRefine. That got me where I need to be while awaiting a workaround.

In later iterations I was dealing with Sequel Ace telling me the import was too long. I ending trying out reducing the field sizes. Very messy.

Here's what I saw in the SQL file.

You can see there's an underscore character added before the A VARCHAR(255) NULL,

CREATE TABLE 20240123IDENTIFIANTS_5_copy_csv (
_A VARCHAR(255) NULL,
NNNNNNNN INT NULL,
NAME1 VARCHAR(255) NULL,
STATUS VARCHAR(255) NULL,
F VARCHAR(255) NULL,
MMMMMMMM INT NULL,
NAME1 VARCHAR(255) NULL,
CCCCCCCC INT NULL,
NAME2 VARCHAR(255) NULL,
STARTING_MANAGEMENT DATE NULL,
ENDING_MANAGEMENT DATE NULL,
REGISTRATION_DATE DATE NULL,
CLOSING_DATE NULL
);
INS

On the part of your answer discussing the special characters that appear when I bring the CSV file into OpenRefine, thanks for that. I see that the Windows1252 character coding is pre-selected, and I didn't know enough about changing it. Thanks for the instructions for a workaround. I had read online that Window 1252 is a choice made when you don't know where the file will be opened, or something like that. I will give your suggestion a whirl.

Have a great weekend!

The issue has been found.
It appears this is an encoding regression of some kind in OpenRefine 3.8 which does not happen with 3.7.

I have opened a new critical issue for it here:

2 Likes

Ah, this explains a lot. Good sleuthing and thanks for opening the issue, and for your comments on allowing custom quotes for SQL exports.

1 Like