GCP Billing Export to BigQuery: Quick Guide to Tracking AI costs

FinOps for AI/ML: Advanced Cost Management in Google Cloud

Overview and Purpose

This export is the foundation for advanced cost management (FinOps), especially for complex AI/ML workloads like Vertex AI and Gemini.

Key Benefits

  • Granular Cost Analysis: Drill down to resource-level usage (e.g., specific GPU hours, API calls).

  • Accurate Attribution: Segment costs by project, team, environment, or custom labels.

  • Custom Reporting: Use BigQuery SQL to build detailed reports, dashboards, and anomaly detection.

Step-by-Step Setup Guide

1. Setup Destination

  • Project: Use a separate project for FinOps (best practice).

  • Dataset: Create a new BigQuery dataset.

  • Crucial: Choose a multi-region location (US or EU) to ensure you get retroactive data (backfill). Regional datasets only collect new data.

2. Enable Export

  1. Navigate to Billing > Billing export in the GCP Console.

  2. Under the BigQuery export tab, enable:

    • Detailed usage cost data: MANDATORY for resource-level AI tracking.

    • Pricing data: Recommended for cost-vs-list analysis.

  3. Select your project/dataset and Save.

Note: Data typically starts flowing into BigQuery within a few hours.


Tracking AI Services (Vertex AI & LLMs)

AI workloads are tracked using the service.description field and Labels. Labels are your primary mechanism for granular attribution.

Scenario Field/Label to Focus On Purpose
General AI Usage service.description Filter for all costs related to “Vertex AI,” “Cloud Storage,” etc.
Job/Model Tracking labels.key & labels.value Apply custom labels (e.g., model_name: v2-rec) to training jobs.
Vertex AI Pipelines labels.vertex-ai-pipelines-run-billing-id Automatically propagates to all sub-resources (VMs, storage) in a run.
Generative AI (LLMs) sku.description Track Gemini costs via token usage SKUs. Combine with Project filters.

Critical Limitations & Solutions

To ensure accurate reporting, keep the following considerations in mind:

Area Issue to Watch Out For Recommended Solution
Data Granularity Standard export is insufficient for AI tracking. Always enable “Detailed usage cost data.”
Retroactive Data Regional datasets do not receive backfilled data. Use a multi-region (US/EU) location for the dataset.
Data Lag Billing data is not real-time (few hours delay). Use Cloud Monitoring/Budget Alerts for real-time warnings.
Schema Changes Raw table schema changes can break SQL queries. Create BigQuery Views to shield reports from schema changes.
GKE Costs GKE resource breakdowns aren’t included by default. Manually enable GKE cost allocation in the GCP Console.
Shared Costs Hard to attribute shared VPC or BigQuery costs. Define internal allocation logic based on proportional usage.

Official References

The information provided is based on official Google Cloud documentation and best practices guides.

1 Like