Estimating Daily Write Volume in BigQuery (TB per Day)

Hi everyone,

I was asked to estimate how many TB of writes per day we’re making in BigQuery across our organization, spanning multiple GCP projects. I’ve explored the BigQuery audit logs and INFORMATION_SCHEMA views, but I haven’t been able to locate the necessary data.

Does anyone know where I can retrieve this information? Any guidance on the best way to estimate daily write volume would be highly appreciated.

Thanks in advance!

Hello,

I am mentioning some potential approaches:

1. BigQuery Audit Logs:

  • Filter by Event Type: Focus on “INSERT” and “UPDATE” events to capture write operations.
  • Examine JSON Payload: Analyze the JSON payload of these events to extract information like table name, row count, and data size.
  • Aggregation: Aggregate data based on specific criteria (e.g., project ID, table name, daily timeframe) to calculate total write volume.

2. INFORMATION_SCHEMA Views:

  • INFORMATION_SCHEMA.TABLE_STATISTICS: This view provides information about table statistics, including the number of rows.
  • Estimate Write Volume: Combine with historical data or assumptions about average row size to estimate write volume.

3. Cloud Monitoring:

  • Custom Metrics: Create custom metrics to track specific aspects of BigQuery usage, such as write operations or data ingested.
  • Alerting: Set up alerts to monitor write volume and identify potential anomalies.

4. Billing Data:

  • Export Billing Data: Export your billing data to a BigQuery table.
  • Analyze Usage: Analyze the billing data to identify patterns related to BigQuery usage, including write operations.

The best approach depends on your specific requirements and the level of detail you need in your estimate.