Datastream - Estimating costs after POC success

Hello,

I have a few Cloud SQL (Postgres) production instances and would like to implement Datastream streams to replicate all of their databases to BigQuery.

I did a POC in dev and it works like a charm, now I’m considering going further, up to prod.

I’m mainly concerned about the costs and would like to calculate an estimate using https://cloud.google.com/datastream/pricing, but I’m not sure how to get something accurate.

I’ve considered looking at the “Cloud SQL Database - Received bytes” metric in Cloud Monitoring and get its sum (or average * duration) for all instances over a month, but I’m not sure that’s the right metric. I can’t find anything closer though.

Any ideas how to better estimate costs?

Thanks!

Charges for Datastream resources, such as CPU and memory, are based on the number of streams and their uptime. These costs are typically included in the per-GB pricing, so separate calculations may not be necessary unless specified by Google Cloud pricing updates.

The primary cost driver for Datastream is the volume of data replicated from your Cloud SQL instances to BigQuery, charged per GB of data processed.

A good starting point is the “Cloud SQL Database - Received bytes” metric, which represents the data flowing into your Cloud SQL instances. However, not all of this data will necessarily be replicated. It’s crucial to identify the scope of replication, including whether all tables or just a subset are being replicated, and to understand the frequency of data changes (inserts, updates, deletes).

Datastream offers two primary replication methods. The Change Data Capture (CDC) mode captures only changes, making it more efficient. The Snapshot + CDC method initially takes a snapshot of your database and then switches to CDC, which can increase costs due to the initial snapshot.

Datastream uses compression to significantly reduce the volume of data transferred, which can lower costs. It is important to factor in the benefits of compression when estimating costs.

Additional Tips:

  • Historical Trends: Analyze historical Cloud Monitoring data to identify patterns in data change volume over time.
  • Pilot Testing: Conduct a limited pilot test on a subset of your production data to obtain real-world cost data.
  • Datastream Profiler: Use the Google Cloud Datastream Profiler to analyze data change patterns and estimate potential resource utilization.
  • Google Cloud Pricing Calculator: Input your estimated data volumes into the pricing calculator for a ballpark figure.

Example Calculation: Assume Cloud Monitoring data shows an average of 50 GB of “Cloud SQL Database - Received bytes” per day across all instances. If 50% of this data is actually replicated and compression reduces it by another 50%, you are left with 12.5 GB per day or 375 GB per month. With Datastream pricing at $0.048 per GB processed, the estimated monthly cost would be 375 GB * $0.048/GB = $18.