Vertex ai enterprise vs gemini-ai-vertex-express api QOUTA limitations

I am working on building on saas products and services using the gcp infrastructure and may need to acquire quota increases in different micro-services. I have read some of the forums and there doesn’t seem to be an answer. I have switched between a few components to either use the enterprise vertex-ai route or the gemini-ai api express that piggybacks off the vertex ai component and seems to allow more llm requests/responses. Throttling happens and 429 errors occur due to limitations on how many calls can be made to an llm on gcp. But if we are building services that require multiple calls to be chained together as a result of RAG dependencies this wont work.

I am testing with gemini-2.5-flash-lite and was not able to find anywhere in the IAM-QOUTA limitations of google console where I can request quota increases.