429 Resource Exhausted error when using gemini-2.0-Flash with langchain

Hit the same error with LangChain gemini-2.0-flash google_vertexai.

vertexai.init(project=os.environ.get("VERTEXAI_PROJECT_ID"), location=os.environ.get("VERTEXAI_PROJECT_LOCATION"))
llm = init_chat_model("gemini-2.0-flash", model_provider="google_vertexai")
embeddings = VertexAIEmbeddings(model="text-embedding-005")
for message, metadata in graph.stream(
{"question": "What is Task Decomposition?"}, stream_mode="messages"
 :disappointed_face: 
print(message.content, end="|")
Retrying langchain_google_vertexai.chat_models._completion_with_retry.<locals>._completion_with_retry_inner in 4.0 seconds as it raised ResourceExhausted: 429 Resource exhausted. Please try again later. Please refer to [https://cloud.google.com/vertex-ai/generative-ai/docs/error-code-429](https://cloud.google.com/vertex-ai/generative-ai/docs/error-code-429) for more details..
Retrying langchain_google_vertexai.chat_models._completion_with_retry.<locals>._completion_with_retry_inner in 4.0 seconds as it raised ResourceExhausted: 429 Resource exhausted. Please try again later. Please refer to [https://cloud.google.com/vertex-ai/generative-ai/docs/error-code-429](https://cloud.google.com/vertex-ai/generative-ai/docs/error-code-429) for more details..
Retrying langchain_google_vertexai.chat_models._completion_with_retry.<locals>._completion_with_retry_inner in 4.0 seconds as it raised ResourceExhausted: 429 Resource exhausted. Please try again later. Please refer to [https://cloud.google.com/vertex-ai/generative-ai/docs/error-code-429](https://cloud.google.com/vertex-ai/generative-ai/docs/error-code-429) for more details..
Retrying langchain_google_vertexai.chat_models._completion_with_retry.<locals>._completion_with_retry_inner in 8.0 seconds as it raised ResourceExhausted: 429 Resource exhausted. Please try again later. Please refer to [https://cloud.google.com/vertex-ai/generative-ai/docs/error-code-429](https://cloud.google.com/vertex-ai/generative-ai/docs/error-code-429) for more details..
Retrying langchain_google_vertexai.chat_models._completion_with_retry.<locals>._completion_with_retry_inner in 10.0 seconds as it raised ResourceExhausted: 429 Resource exhausted. Please try again later. Please refer to [https://cloud.google.com/vertex-ai/generative-ai/docs/error-code-429](https://cloud.google.com/vertex-ai/generative-ai/docs/error-code-429) for more details..
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/google/api_core/grpc_helpers.py", line 170, in error_remapped_callable
return _StreamingResponseIterator(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/google/api_core/grpc_helpers.py", line 92, in __init__
self._stored_first_result = next(self._wrapped)
^^^^^^^^^^^^^^^^^^^
File "/home/khteh/.local/lib/python3.12/site-packages/grpc/_channel.py", line 543, in __next__
return self._next()
^^^^^^^^^^^^
File "/home/khteh/.local/lib/python3.12/site-packages/grpc/_channel.py", line 969, in _next
raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = "Resource exhausted. Please try again later. Please refer to [https://cloud.google.com/vertex-ai/generative-ai/docs/error-code-429](https://cloud.google.com/vertex-ai/generative-ai/docs/error-code-429) for more details."
debug_error_string = "UNKNOWN:Error received from peer ipv4:74.125.68.95:443 {grpc_message:"Resource exhausted. Please try again later. Please refer to [https://cloud.google.com/vertex-ai/generative-ai/docs/error-code-429](https://cloud.google.com/vertex-ai/generative-ai/docs/error-code-429) for more details.", grpc_status:8, created_time:"2025-03-11T12:35:24.732505078+08:00"}"
>

As seen in the console, LangChain has the retry with exponential backoff logic but still fails after 10 seconds!

https://console.cloud.google.com/iam-admin/quotas? is a fxxking mess / Amazon jungle to explore and pinpoint the cause of the issue! There are 12,525 “Quotas and System Limits” in the page! I have not seen any quota being exceeded after scrolling past few pages in the the table!

Image

https://github.com/langchain-ai/langchain/issues/22241

According to https://aistudio.google.com/prompts/new_chat, there is 15 RPM 1500 req/day for Free tier but the execution above is definitely less than that?