Error while reading data, error message: Could not parse ‘DateTime’ as TIMESTAMP for field DateTime (position 0) starting at location 0 with message ‘Could not parse ‘DateTime’ as a timestamp. Required format is YYYY-MM-DD HH:MM[:SS[.SSSSSS]] or YYYY/MM/DD HH:MM[:SS[.SSSSSS]]’
But this is the beginning of my file which looks as expected
The error message indicates that BigQuery is encountering an issue when trying to parse a value in the DateTime column as a TIMESTAMP. The required format for TIMESTAMP values in BigQuery is YYYY-MM-DD HH:MM[:SS[.SSSSSS]]or YYYY/MM/DD HH:MM[:SS[.SSSSSS]].
From the sample data you provided, the format of the DateTime column appears correct. One potential issue could be that BigQuery is mistakenly trying to interpret the header “DateTime” as a TIMESTAMP value, which would lead to the error. It’s also possible that there are other rows in your dataset with incorrect or missing DateTime values.
Before resorting to more complex solutions, ensure the following:
The entire dataset adheres to the correct DateTime format.
The header row is recognized as a header by BigQuery and not as data.
If the issue persists, a workaround is to first load the data as STRING and then cast it to TIMESTAMP:
INSERT INTO final_table (
DateTime,
Mth_yr,
kWh_gas,
kWh_elec
)
SELECT
CAST(DateTime AS TIMESTAMP),
Mth_yr,
kWh_gas,
kWh_elec
FROM staging_table;
Once the data is in the final table, you can query it as usual. This approach ensures that any discrepancies in the DateTime format are addressed before the final insertion into the desired table.
If you believe it’s a bug, especially given the consistency in your data formatting, it might be worth raising the issue with Google Cloud Support or checking BigQuery’s known issues to see if others have encountered a similar problem.
In the meantime, you can use the following Python code to ingest your large number of CSV files into BigQuery:
import io
import os
from google.cloud import bigquery
# Create a BigQuery client.
client = bigquery.Client()
# Create a dataset to store your data in.
dataset = client.create_dataset(
"my_dataset",
location="US",
)
# Get a list of all the CSV files in the current directory.
csv_files = os.listdir()
# Iterate over the CSV files and load them into BigQuery.
for csv_file in csv_files:
if csv_file.endswith(".csv"):
with io.open(csv_file, "rb") as f:
data = f.read()
# Create a BigQuery load job.
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.CSV
job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE
# Load the data into BigQuery.
job = client.load_table_from_uri(
data,
dataset.table(csv_file[:-4]),
job_config=job_config,
)
# Wait for the job to finish.
job.result()
# Print a success message.
print("All CSV files have been loaded into BigQuery successfully!")
This code will iterate over all the CSV files in the current directory and load them into BigQuery, truncating the existing tables if they already exist. You can modify the code to suit your specific needs, such as changing the destination dataset or the write disposition.