Hello everyone!
I could really use some assistance!
I’ve set up a pipeline to copy data from my managed SQL (PostgreSQL) on GCP to BigQuery. I followed these guides:
- Google Cloud Datastream Documentation (this mentions that PG is supported)
- Reverse Proxy for Cloud SQL Connection
I successfully created a Dataflow template as outlined here: Datastream to BigQuery Template
The pipeline requires two BigQuery datasets: the main dataset and a staging dataset. After getting the pipeline running, I started seeing data being ingested into the staging dataset. The pipeline auto-created columns and added the metadata* fields, but it did not add the _metadata_uuid and _metadata_lsn fields, which is causing the merge step to fail.
I checked the data in the GCS bucket that holds the delta files, and I can see the LSN field present in the metadata JSON.
Does anyone know why the _metadata_uuid and _metadata_lsn fields were not auto-created in the BigQuery datasets? Any help would be greatly appreciated!
Thanks in advance!
Note: Note: I was able to get merge step to work by manually adding _metadata_uuid and _metadata_lsn but this feels wrong. Especially if I have dozens of tables
Actual Error message that the job produces (for every table):
Merge Job Failed With BigQuery Exception: com.google.cloud.bigquery.BigQueryException: Query error: Unrecognized name: _metadata_lsn; Did you mean _metadata_ssn? at [1:627] Statement: BEGIN BEGIN TRANSACTION; MERGE …
The _metadata fields that were auto created are:
"_metadata_timestamp",
"_metadata_read_timestamp",
"_metadata_read_method",
"_metadata_source_type",
"_metadata_deleted",
"_metadata_change_type",
"_metadata_row_id",
"_metadata_scn", <---- THIS IS AN ORACLE FIELD?
"_metadata_ssn",
"_metadata_rs_id"