Hello GKE Team,
I’ve encountered challenges during GKE cluster deployments, especially when working with GPU node pools. Identifying quota-related issues has been difficult, leading to deployment failures.
Currently, the quota check happens only after submitting the “Create” request, often resulting in failure screens after a wait time of over five minutes. This delay impacts planning and deployment efficiency.
Suggestion:
Consider implementing a basic pre-check(Quota, Network related checks) directly on the GKE cluster creation screen. This would allow users to identify potential quota limitations before submission, improving the overall deployment experience.
I am looking forward to hearing your thoughts on this enhancement.
Thank you!
It’s a great suggestion to identify the issue in advance for consumers who are creating the clusters from console.
From my personal experience, there may be few trade offs though to keep in mind with this.
- Most of the consumer use automation tools (like Terraform, CLI, to have consistency for resource creation, GCP Deployment manager and behind the scene its an API. It wont help larger base of the consumers.
- GKE uses lots of GCP resources so what kind of resources to show for quota limits would be tricky. e.g. Each region has its own set of resource quotas and limits. There are quotas for nvidia/ai machine types on certain region and not all consumers are using it.
Overall it might look good for some consumers for console notice but for some consumers it might be noise/unnecessary notifications.
I would prefer to setup alerting for the specific resources that you care about from
Monitoring → Alerting → Create Policy.
-
Add a Condition:
- Click Add Condition.
- Select Quota Metric from the available metric categories.
-
Select a Metric:
- Search for a specific quota metric, such as:
Quota Usage (vCPU usage)
Quota Usage (Persistent Disk usage)
Quota Usage (In-use IP addresses)
- Choose the desired metric and resource type.
Seems reasonable and we are doing more to integrate quot mgmt with GKE.
In terms of asking for this feature, it would be great if you could file a feature request: https://issuetracker.google.com/issues/new?component=187077&template=1162666&pli=1