Hi,
I’m encountering a persistent issue with Vertex AI batch image generation.
The Problem
I am trying to generate images using a batch job, but:
-
The job runs for a long time and then fails with:
Deadline exceeded due to job running for maximum allowed duration of 24 hours. Please retry the unprocessed rows or start over with a smaller batch size.
-
Even worse, the job processes fewer than 5~40 images out of 300 before failing
-
This has been happening repeatedly for about a week
I also tried reducing the batch size:
-
With 10 images, the job failed after about 4.5 hours with the following error:
System error. Please try this operation again. If the issue persists please visit https://cloud.google.com/support-hub to view your support options.
Key Concern
The job is making almost no progress before failing, which suggests this is not simply a batch size limitation.
Environment
-
Model: gemini-3-pro-image-preview and gemini-3.1-flash-image-preview
-
Region: global
-
Execution method: Batch prediction
-
Authentication: Service account (API-based usage)
-
Input batch size:
-
300 images → fails (only ~5 processed)
-
10 images → fails early (~4.5h, system error)
-
What I’ve Tried
-
Reduced batch size (300 → 10)
-
Retried multiple times over several days
-
Verified API usage via service account
Questions
-
Is this expected behavior for batch image generation, or could this indicate a backend/service issue?
-
Are there any hidden limits or constraints (throughput, concurrency, etc.) for batch image generation?
-
Could this be related to region or resource allocation?
-
Is there any way to identify which rows are failing internally?