Hey there,
I’m encountering extremely high latencies (60+ seconds to first token) on gemini flash models (2-5 flash & 2-5 flash lite). Input token size is around 2k tokens, so not much at all. Interestingly, it seems like the issue resolves itself after a couple of such high latency generations and time to first token drops to around 500ms. After an extended period of inactivity (10+ minutes), the latency shoots up to 60+ seconds again though.
Any ideas what might be happening?
Cheers,
Mugeeb