429 with no quota or rate limit hit

hildre · December 5, 2025, 2:22pm

Hi, we’re on Vertex and sending requests to Gemini 2.5 Pro (GA). We keep getting 429, but we are far away from hitting any quotas visible in the quota panel (the biggest usage is on 2% on an unrelated quota). Is this a known issue? We are not using a lot of tokens.

iTazB · December 5, 2025, 4:46pm

Hey,

Hope you’re keeping well.

A 429 from the Vertex AI Gemini API can occur even if you haven’t reached the visible quotas in the console, because there are backend rate limits that aren’t exposed in the quota dashboard. These limits can be per-project, per-region, or tied to concurrency on the model, especially for high-demand GA models like Gemini 2.5 Pro. I’d recommend checking the Vertex AI > Monitoring > Requests section in Cloud Console to see actual request patterns, and try lowering concurrency or adding small delays between calls. If the issue persists, open a support case with your request IDs so the Vertex AI team can confirm whether you’re hitting hidden service limits.

Thanks and regards,
Taz

hildre · December 8, 2025, 1:43pm

Thank you Taz, appreciate your help! We’re monitoring and logging through other services so have not configured the monitoring interface per region yet. This makes sense though, I remember seeing more detailed quota information on individual models in the quota panel last year. So current best practice to keep track of usage and quotas is to set up individual Vertex AI Monitoring for each model+region combo we’re using, which might give us the necessary info?

hildre · December 8, 2025, 2:01pm

In case someone searches and finds this, seems like Vertex AI Monitoring is for custom models and not Gemini models, but the answer is here:

We understand that encountering a ‘resource exhausted’ 429 error can be frustrating and might lead you to suspect you are hitting some sort of quota limit. However, with DSQ, this is not the case. These errors indicate that the overall shared pool of resources for that specific type (e.g., a particular model in a specific region) at a specific time is experiencing extremely high demand from many users simultaneously.

Nate_Karr · February 4, 2026, 7:01pm

Hi, we’re also on Vertex and sending requests to Gemini 3 Flash Preview.

We also keep getting 429, and are far away from hitting any quotas visible in the quota panel.

Is there any way to fix this or increase this limit?

hildre · February 4, 2026, 8:34pm

I think you basically have to provision tokens to get a higher stability, as per the documentation above you can get 429 when there is not sufficient compute available in that region (for anyone, not quota based). I guess a fallback to a different location could be an option.

Topic		Replies	Views
Vertex Gemini 429 Resource exhausted. Custom ML & MLOps gemini-in-looker , vertex-ai-platform	2	1019	April 9, 2025
429 quota exceeded AI APIs text-to-speech	5	89	February 4, 2026
Hitting insane 429 errors Custom ML & MLOps vertex-ai-platform	2	528	June 27, 2025

429 with no quota or rate limit hit

AI Suggested topics