Importing from CSV: only 50% appear in table

I’m importing a couple of CSV files to my database into two separate tables. I have tried numerous times and onyl 50% of the lines in the CSV appear as results in the table, e,g, my CSV file has 617833 lines and I import this files, but when I query the table to count the number of records then there’s only 308917 records. I have done the same on another table, importing 2.8 million rows and only 1.4 million make it to the table.

It appears to be missing every other line, so line 1 is imported, line 2 is not, line 3 is imported, line 4 is not etc.

Is there something I’m doing wrong?

Hello @StephenHind,

My first guess is that your CSV may be badly formatted, leading to two lines interpreted as one. Can you check in your database that the last column imported isn’t overfilled with data or plainly empty due to an import error?

You may be interested in looking at the gcloud sql import csv command, using --lines-terminated-by.

Specifies the character that split line records. The value of this argument has to be a character in Hex ASCII Code. For example, “0A” represents a new line. This flag is only available for MySQL. If this flag is not provided, a new line character will be used as the default value.

Thanks for the response. The line termination was my first thought, but the actual issue turned out to be the import didn’t like the double-quotes arround each bit of text, so I removed them and all the records were imported.

Damn, I hesitated to talk about this too :man_facepalming:

In case someone’s facing the same problem, here’s some informations about --quote=QUOTE :

Specifies the character that encloses values from columns that have string data type. The value of this argument has to be a character in Hex ASCII Code. For example, “22” represents double quotes. This flag is only available for MySQL and Postgres. If this flag is not provided, double quotes character will be used as the default value.

Thanks again; I’m sure this will yhelp others in the future.

Just so you know I waas not using this commant, but the web UI to import. Hopefully Google can improve that tool to be more resilliant with the file formatting or actually give you theses options in the web UI.

This usually comes down to how the parser is handling line breaks or quoted fields. If every other line’s missing, there’s a good chance the file has unescaped newlines inside quoted fields. I’ve seen that happen a lot with exports from Excel or some ETL tools.

I just pull it into pandas.read_csv() with strict quoting and let it fail fast using on_bad_lines=‘error’. If something’s wrong, it’ll throw right away. Sometimes I’ll grep for quote mismatches per line. It’s a quick way to catch rows that’ll break the import.

Ingest pipeline on our end includes Windsor for automated scheduled syncs, schema checks, and early row validation. That, paired with Postgres COPY or bcp for SQL Server, covers most edge cases.

Last time I hit this, tweaking the ESCAPE and QUOTE settings during import fixed it. File encoding and line endings are always worth checking, too.