GCP Batch unable to run jobs for us since 2025 Nov 11th

gradientopt · November 13, 2025, 11:02am

Since 2025 Nov 11th, our team’s Batch jobs always ends up like this: it is in state scheduled for a long while with several vm instances already spinned up but no task is actually running. Later it would fail with the following error:

This should not happen because we just requested some c2-standard-30 or e2-highmem-16 with Standard provision model in us-central-1,which should be very easy to satisfy. Could someone DM me and help us take a look? Thanks!

Aleksander_Mielczare · November 20, 2025, 1:17pm

Hi! There are several ways to try to manage this problem but from the information you provided I would recommend verifying your job configuration. Gcloud offers capabilities to check if machines of a given type are available in that region, e.g.:

gcloud compute machine-types list

Machine type is not the only reason for which ZRPE can happen. Resource types that are subject to stockouts include:

Compute Resources (vCPUs and Memory):
Specific VM families (e.g., N1, N2, N2D, E2, C2, C3, M1, M2, M3, A2, A3, G2, etc.).
Specific VM shapes (e.g., n2-standard-64, c2-highmem-32).
Minimum CPU platforms (e.g., requesting Intel Ice Lake or later).
The sheer amount of cores or RAM requested.
Accelerators:
Specific types and counts of GPUs (e.g., NVIDIA T4, V100, A100, H100).
TPUs (Tensor Processing Units).
Storage:
Local SSD: Lack of available Local SSD capacity on machines that match the other VM requirements.
Persistent Disk (PD): While sometimes manifesting as a PD_STOCKOUT, the inability to create the required PD (due to lack of cell-level capacity for HDD, SSD, or IOPS) can cause the VM creation to fail, sometimes still surfacing as a general ZRPE to the user.

Good test would be also to temporarily rent an instance (or single-vm Regional Managed Instance Group) with the same spec - this would allow you to say if the region supports that configuration. From your description it seems it does (“several instances spinned up“)

Once you exclude these reasons I encourage to raise a ticket with details of a recent failure.

Topic		Replies	Views
Batch - Newly created job isn't starting? Stuck in Queued status? Compute Infrastructure compute-engine , batch	5	64	February 22, 2023
How does Google Batch deal with resource availability errors Compute Infrastructure batch	2	10	September 5, 2024
Very strange behavior lately... logged status messages from Google Batch flagged as errors Compute Infrastructure compute-engine , batch	7	25	September 18, 2024

GCP Batch unable to run jobs for us since 2025 Nov 11th

AI Suggested topics