Persistent 429 RESOURCE_EXHAUSTED error with gemini-3.1-flash-image-preview

Hello everyone,

I am developing a face-swapping application using the Python google-genai SDK. Currently, I’m evaluating the gemini-3.1-flash-image-preview model via Vertex AI.

During my testing, the pipeline frequently crashes, and I am consistently running into this specific error:

Critical Error: 429 RESOURCE_EXHAUSTED. {'error': {'code': 429, 'message': 'Resource exhausted. Please try again later. Please refer to https://cloud.google.com/vertex-ai/generative-ai/docs/error-code-429 for more details.', 'status': 'RESOURCE_EXHAUSTED'}}

What are the overall best practices and architectural fixes to handle or prevent these 429 errors when working with preview image models?

I came across option of provisioned throughput, but it is not cost efficient for our app.

Any advice or suggestions would be greatly appreciated!