trying datastream postgres

Hello, fellow googlers, my name’s farizi, and I currently work as a data engineer. right now I am trying the newly released datastream for Postgres.

The flow is like this :

  1. Create dummy cloud SQL Postgres
  2. Create connection profiles for Postgres DB and cloud storage in the Datastream
  3. Create pub sub topic and subs for notification in GCS
  4. Create dataflow by using datastream to Bigquery template
  5. Start the stream

The pipeline is already running, but I have encountered some issues like the schema of the stream in Bigquery is not the same as the Postgres source, can someone help me, I know we are still in the preview stage but any kind of help is appreciated.

I found this log on dataflow, this might help us to debug the problem :

IOException Occurred: Failed to Retrieve Schema: projects/783341903604/locations/asia-southeast1/streams/datastream-postgres-test public.guestbook : java.io.IOException: Source Connection Profile Type Not Supported

As this is a preview feature of Datastream, you can also share these findings in an issue with the Google Issue Tracker. This would help the feature along its preview phase, in which the features are still being developed.

1 Like

Can you try Postgres to BigQuery direct stream ?

Also the schema will not be exactly be the same. Because cdc is capture from the log files and some extra meta data is also recorded like timestamps etc which we can determine the latest records (for SCD2 usecases)

1 Like

Sorry for the wait, after using the Postgres bigquery pipeline in the datastream, I haven’t found the schema issue, I will update later if I have found another one but I will accept this as a solution now, Thanks!