I keep receiving 429 Quota exceeded errors when trying to create embeddings. Looking at my quotas, I can see “online_prediction_requests_per_base_model” is limited to 5/minute.
Welcome and thank you for reaching out to our community.
I understand that our documentation can be confusing at times but let me help you get a better picture of our quotas and limits.
The base_model:textembedding-gecko indeed has 600 requests per minute quota but it is limited to 5 input text per request. This means that you can have a maximum of 600 request instances per minute with a maximum of 5 input text for each request, as shown in the screenshot that you have provided.
Please do note that you can also reach out to Vertex AI Support to discuss more of this in detail.