I am unable to create a node pool with n1 nodes and T4 GPUs. I tried changing the region and switching to another project in us-central1, but even in that case, the system failed to fulfill the request, creating only 2 out of 3 nodes.
I have two questions:
Is there a way to determine which region has the highest GPU availability?
If we manage to allocate nodes with GPUs on GKE, in the event of node restarts due to maintenance/updates performed by GCP itself, would the GPU remain allocated to us, or is there a risk of losing them?
The issue you’re facing is likely due to insufficient GPU quota or resource availability in the selected region or zone. This can happen even in regions like us-central1, which typically have higher resource availability.
You can start by verifying your quota and checking the GPU availability in your selected region and zones.
If you need more information on GPU availability or capacity in specific regions, or if you’ve already checked quotas and explored other zones or regions without any success, you can contact Google Cloud Support for further assistance at any time.
To find the regions with the highest GPU availability, you can refer to Google Cloud’s GPU availability checker, which provides details on where GPU resources are most accessible.
If you successfully allocate GPUs on Google Kubernetes Engine (GKE), Google Cloud ensures that GPU resources remain assigned to your project even if nodes restart for maintenance, provided that GPUs are still available in the selected region. However, temporary interruptions may occur due to hardware updates.