How to design a multi-project GCP cost control system with per-project auto shutdown using Terraform, Python Cloud Functions, and Budget alert notifications (Pub/Sub/email)?

I work in a cloud lab rental company that provides temporary GCP environments for students to practice and learn. As part of improving our platform, I am trying to design a cost-control system in Google Cloud Platform (GCP) for multiple projects.

Goal:
I want to automatically stop, scale down, or disable all billable resources within a project when its defined budget threshold is exceeded, in order to prevent further costs.

Scope:

  • The solution should work across multiple projects.

  • Each project must be handled independently. If one project exceeds its budget, only that project’s resources should be affected, without impacting other projects.

Budget notification requirements:

  • When a project reaches 50% and 75% of its budget, notification emails should be sent to a central/management account.

  • When the project reaches 100% of its budget, an email notification should be sent, and an automated action should be triggered to stop, scale down, or disable all stoppable resources within that specific project.

  • The automation must strictly apply only to the project that exceeded its budget, not to other projects.

Current approach:

  • Use GCP Budget alerts with notifications (Pub/Sub and/or email).

  • Trigger a Python-based Cloud Function (Gen2) from Pub/Sub.

  • Use the Cloud Function to identify and stop or disable running resources.

  • Use Terraform to provision and manage the infrastructure.

Challenges / Questions:

  1. What is the recommended architecture for implementing this type of budget-based auto shutdown system across multiple projects?

  2. How can I reliably identify and handle different resource types (e.g., Compute Engine, GKE, Cloud Run, etc.), given that not all services can be directly “stopped”?

  3. What are the best practices for configuring budget alerts for multiple thresholds (50%, 75%, 100%) with both email and Pub/Sub notifications?

  4. What are the best practices for ensuring this operates safely and does not unintentionally disrupt critical resources?

  5. Are there any limitations or delays in budget alert notifications that could affect real-time cost control?

Additional context:

  • I am using Terraform for infrastructure provisioning.

  • The automation logic will be implemented using Python in Cloud Functions.

  • I understand that GCP does not provide a single API to stop all resources, so I am looking for a practical and scalable approach to handle different resource types.

Any guidance, reference architectures, or best practices would be greatly appreciated.

1 Like

about the budget alerts - they are not quick, only after few hours of consumption of resource they are reflected in the budgets, there is a slight delay - reference link

you can limit usage using quotas is better way to restrict the usage.
Quotas are operational limits, If a student tries to exceed a quota, the request is rejected immediately