User defined quotas seem to be delayed

Hello Community,

I am currently using the Search Grounding tool and attempting to implement cost controls by setting a hard limit on my API usage.

The Goal: I want to limit my usage to approximately 5,000 requests per month. To achieve this, I calculated a daily limit and configured a specific quota of 166 requests per day in the Google Cloud Console.

The Issue: Despite setting this quota, I have observed that my actual API usage consistently exceeds the 166 limit.

  • The API continues to return successful responses (HTTP 200) well past the limit.

  • I am not receiving any HTTP 429 (Too Many Requests) or similar 4xx errors that would indicate the quota has been triggered.

  • I can see the usage counter increasing live (with delay) in the console, so the tracking seems active, but the enforcement is missing.

My Question: Is the daily quota for Search Grounding treated as a “soft” limit or is there a significant latency in enforcement? Has anyone else successfully configured a hard stop for this specific API?

Any advice on how to strictly enforce this limit would be appreciated. Thanks!