I'm trying to process a TSV file via OpenRefine so I can import it into Sequel Ace. I am running into many errors despite support from Thomas Guignard.
This is the latest, from importing an OR SQL to Sequel Ace:
[ERROR in query 1] You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '??RCAEETT BAELI EDTNFIAITN_SIAMFc_vs5_i_pmro_tst v"
"AV RAHCRA2(55 )UNLL"
_N_N_N' at line 1
If I persist, sometimes I get the file to import, but only show me the first character of every word.
None of these characters occur in my data. I had selected UTF-16 on Thomas's recommendation. At first glance, this looks like an encoding error.
Or is it evidence that I should reinstall?
It starts like this: the CSV field, if you click on them, reveal the full field name. But if you import it, it shows a single character per field.
A TSV file is a Tab Separated Values file.
A CSV file is a Comma Separated Values File.
It appears that Sequel Ace can import both types of files from inspecting their code. Their docs didn't mention much at all?
But I'm not familiar enough with its options. And I searched their GitHub Discussions and surprisingly didn't find anything about tsv
.
I don't think this is an OpenRefine issue at first glance.
It seems to be a parsing issue with the TSV file and it's structure with Sequel Ace.
So we might try another SQL tool to see how things fare.
I do know that a majority of our DB users out there that also use OpenRefine for data wrangling, are using DBeaver Community | Free Universal Database Tool which supports MySQL and many others, and works on Mac OS, Windows, Linux.
Perhaps give DBeaver a chance, and see if it has issues importing the TSV file that you are exporting from OpenRefine?
Also, if you could share your TSV file that you are exporting, then I can also take a look.
If it's private you can email me thadmguidry@outlook.com with the share link to the TSV file on whatever cloud storage you use. Or compress with zip, etc. and attach to your email message to me.
1 Like
@gordonmx Thanks for emailing me privately.
It is a TSV file.
I've inspected it and verified that it is also a UTF-16 LE with Byte Order Mark (BOM) which is not an issue here.
I was able to import into SQLlite default database in DBeaver as well as my local MySQL, after I removed the header line row which had column that exceeded the VARCHAR(255) length.
- Import the ZIP file and choose to parse as a TSV.
- Uncheck the "Store archive file" option.
- Check the box for "Discard initial 0 row(s) of data" and change its rows value from 0 to 1.
- Check the box for "Trim leading & trailing whitespace from strings". (because I saw lots of unnecessary whitespace that you probably don't care about)
Otherwise, you'll run into the following issues when you Export with the SQL format and try to load into MySQL:
Table will be created BUT NO ROWS INSERTED (because of issue of header hyphens row)
Data truncated in values for one of your columns, if you look at the MySQL Execution Log, you would notice this error during INSERT:
[1406] Data truncation: Data too long for column 'xxx_xxxx_xxxxxx' at row 1
Let me know if following steps 1-4 does indeed FIX the issue you were having (which really was just an issue of the VARCHAR(255) being exceeded by more than 255 chars from the xxx_xxxxxx_xxxxx
column.
NOTE: You can alternatively just simply change the datatype maximum length of the column declarations in the SQL file to use VARCHAR(300) or maximally VARCHAR(65535) as described in MySQL, or change them to use the TEXT() data type.