Extremely high latency on gemini models

Hey there,

I’m encountering extremely high latencies (60+ seconds to first token) on gemini flash models (2-5 flash & 2-5 flash lite). Input token size is around 2k tokens, so not much at all. Interestingly, it seems like the issue resolves itself after a couple of such high latency generations and time to first token drops to around 500ms. After an extended period of inactivity (10+ minutes), the latency shoots up to 60+ seconds again though.

Any ideas what might be happening?

Cheers,
Mugeeb

Turns out it was a DNS issue. Resetting the DNS cache resolved the issue and now latency is consistent again!

1 Like