How to find which quota i actually hit?

Recently i have hit (probably) concurrent ai model api rate limit when calling gemini-2.5 prediction and received error code 429, “Resource exhausted. Please try again later. Please refer to https://cloud.google.com/vertex-ai/generative-ai/docs/error-code-429 for more details.”

I wanted to see what my actual quota limits are so i went into the Vertex AI API metrics to see the logs.

I see that the method that produced is “google.cloud.aiplatform.v1.PredictionService.GenerateContent”, but it’s not helpful in debugging which quota i hit

I have tried to go to the quotas and system limits page for my project and it’s just really impossible to find relevant quota information for specific models (or maybe regions) because the model names are a complete mess.

I tried narrowing it down by Service: Vertex Ai API, and even setting a specific random region like us-central1 but there is hundrends of entries in the quota pagination page

Now, i know i have called “gemini-2.5-pro-preview-03-25”, but there is no such model in the dimension “base_model”, the newest model i can filter out there is “gemini-2.0-flash-live”, but then there are random things like “gemini-experimental”, or there is another dimension called “model”, and then another one called “base_model_id_and_resolution”

What do i have to put into filters to i find out what are the actual quota for the forementioned gemini-2.5-pro-preview across different regions?

And is there really no simple way to go directly to the specific error rate limit somewhere from Vertex AI API panel where i see the error call?

Hi @randomstuff,

Welcome to the Google Cloud Community!

It looks like you are encountering an issue identifying which of your Vertex AI quotas triggered your 429 error, and then navigating your Google Cloud Quotas interface to find the specific limit for your Generative AI model.

Here are the potential ways that might help with your use case:

  • Identify the affected quota: To identify which of your quotas triggered the 429 error, you can use your Cloud Monitoring Metrics Explorer by selecting the Generative AI Model resource type and reviewing metrics such as ‘concurrent_requests’, ‘request_count (rate)’, and ‘token_count (rate)’ for your region and base model.
  • Specify Region: Make sure you’re viewing your Quotas page filtered to the exact region where you’re making your gemini-2.5-pro-preview-03-25 requests (e.g., us-central1).
  • Use Keyword Search: Avoid searching for the full model name. Try looking up general Generative AI inference quotas like “requests per minute,” “tokens per minute,” and “concurrent” under the “Vertex AI API” service instead.
  • Consider Raising a Quota Increase Request: After you’ve identified the specific quota, such as “Generative AI model serving (tokens per minute),” you can request an increase straight from the Quotas page if your usage needs go beyond the default limits.
  • Set up Cloud Monitoring dashboards and alerts: Consider setting up a Cloud Monitoring dashboard and configuring alerts to help track your usage relative to quota limits and avoid future 429 errors.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

l will be honest, i am completely desperate and lost. I have been trying to navigate the metrics explorer for an hour without avail.

By “Cloud Monitoring Metrics Explorer”, you mean this, right?

Also, the metrics explorer does not have any region filter, nor"keyword search"

When i click on the Metric and then where select resource, there is no “Generative AI Model”, i can only see “Vertex AI Model Garden Publisher Model” and “Vertex AI Endpoint”

When i pick some metric i see /some/ graph but nothing i pick shows anything related to error rate

The api method i call is https://us-central1-aiplatform.googleapis.com/v1/projects/[project-id]/locations/us-central1/publishers/google/models/gemini-2.5-pro-preview-03-25:generateContent

I know the exact time at which the quota limit happened also. It is quite strange though because i called the api only one time in 30 min timeframe and it gave me Resource exhausted, without specifying what went wrong

We are having the same problem. The response from Google was not at all helpful. I agree with the OP that there is no way (that I can see) to find out what we are doing that is causing the errors. We are already doing the backoff and retry. We have no way to see how google.cloud.aiplatform.v1.PredictionService.GenerateContent relates to what we are doing or what quotas are related to these erorrs. Our quota page (which is nearly impossible to use / find anything) shows we are not near any quota limits.

And thise forum also has a terrible interface

@randomstuff are you still experiencing this error with the latest models? The model you used above was an experimental/preview model, and these models usually have limited capacity. I’d suggest:

  1. Switch to a stable or generally available model, for example gemini-2.5-flash or gemini-2.5-pro
  2. Use global endpoint i.e. location=”global”
  3. Subscribe to Provisioned Throughput if possible
    Hope it helps.
1 Like

@ericdong hello, thank you for the suggestion! I have indeed switched to 2.5-pro after the preview-03-25 went down in july.

However, this does not really solve the issue, as even though 2.5 pro has less limited capacity i still don’t actually know which quota limit i hit. Updating my location won’t help either, because if anything goes in production on 2.5-pro - i still will never know what exact error/api rate limit i hit, because If something goes wrong i need to know what, so i can tackle this issue. And up to this day it has been a month and a half and literally nobody has been able to tell me how to find this information in the developer console, as if even google employees did not know how to find this information. Maybe you could help me out?

@Robert_Berger I understand your frustration, the developer console experience has been consistently negative over the years, even in old times when you just wanted to use google maps api and it took you half a day to just get an api key. Regarding our current case, we were testing out Google Cloud platform for Gemini but did not end up using it because of the Google Console complexity and unclear api rate limits terms/inability to find out what actually went wrong during testing. For now we are using OpenAI LLMs in production because they have a very clear rate limiting rules all in one place and give you exact error indicator in case anything goes wrong.

Hello @randomstuff,

Can you open the GCP Vertex AI API/Service Details page and then add more Column display options.

And then, add these 3 columns.

This may help you to troubleshoot which API is being overused.

Next, while your workflow is running, you can monitor the Current Usage Percentage by sorting it (descending: high first).

The downside of this method is that you have to refresh the page until you find the limited API. This may not be the smartest method but it’s easy and straightforward.


Another way to troubleshot quota exceeded error is to go to the Logs Explorer and search by Metric Name :

protoPayload.status.details.quotaExceeded.metricName="[Metric Name]

If nothing is found, you may use a broader search and explore the logs yourself:

SEARCH("quota")

Be sure to use an appropriate Time Window. If you don’t know which one to use, pick a large option like Last 7 days or more.

Once you found an interesting log entry, you can click on it and then click on Expand nested fields, so you can look for Scope to find Zone and Region information.

If you used the broader search using SEARCH("quota"), you may be interested to look for the field metricName.

Once you have the metricName and the region, you can go back to the GCP Vertex AI API/Service Details to search by Metric:

Then, you may need to increase an API quota depending of its region or workaround them by distributing the load over multiple region.

@randomstuff @Robert_Berger thank you for providing more details and I appreciate the feedback.
I hope @LeoK ‘s instructions above help your troubleshoot.

@LeoK thank you for the kind response and extensive guide

I have tried firing some of my old api calls and look at that “Quota system limits page” that is supposed to show me my actual usage. i was calling https://us-central1-aiplatform.googleapis.com/v1/projects/[project]/locations/us-central1/publishers/google/models/gemini-2.5-pro:generateContent as usual, and inside the quota limit page used filter “Current usage: > 0”, and that list only showed that the only quota that i was using was “Online prediction requests per minute per region” at “1/30,000”, which is incredibly suspicious because i am pretty sure google must implement tokens per minute(/day?) and requests per minute(/day?), but that page just is not showing me what i’m using

Also, thanks for that logs explorer! I was hoping to find my vertex api calls in there, but in the logs i see abolutely nothing in that time when that error happened

and my server logs clearly state that the 3 errors that happened occured at 2025-06-24 13:42:45, 2025-06-26 12:38:18, and at 2025-07-07 10:02:10getting the same message

Failed to analyze video with Gemini API. HTTP code: 429. {
  "error": {
    "code": 429,
    "message": "Resource exhausted. Please try again later. Please refer to https://cloud.google.com/vertex-ai/generative-ai/docs/error-code-429 for more details.",
    "status": "RESOURCE_EXHAUSTED"
  }
}

the only api logs i see are CreateArtifact, UpdateDataset, CreateDatasetVersion, CreateArtifact from very specific date in august (08-08-2025), but no logs before and no logs after, even though we made api calls to https://us-central1-aiplatform.googleapis.com/v1/projects/[project]/locations/us-central1/publishers/google/models/gemini-2.5-pro:generateContent at different times and dates. In other words: the logs explorer is completely random and does not show all logs at all, even with all filters turned off

@ericdong: unfortunately no. Absolutely nobody has been able to tell me which quota we hit on this specific date, or how to address this issue in case it happens again in the future and even when talking with Gemini (which is a Google product) about how to get to this information it gets lost just like me and cannot help. In short, the google console experience has been nothing but agony and we have absolutely 0 guarantee that we would be confident in using Vertex ai in production