Vertex Finetuned Gemini 2.5 Flash: 5x-10x latency increase since 11 days ago, silence on support ticket

We are experiencing a significant latency increase with my fine-tuned Gemini 2.5 Flash model on Vertex AI in the us-west1 region, specifically for endpoint ID {Endpoint ID}.

The latency has jumped from approximately 5 seconds to over 30 seconds at P95, starting around November 19, 2025, between 7:55 PM UTC and 9:25 PM UTC, without any changes on our end. Times in the screenshot are UTC+9, so they appear as Nov 20 there.

This issue has persisted for 11 days now. We have already reviewed Vertex AI monitoring metrics, checked for Google Cloud service health issues, inspected model version and deployment details, analyzed input/output token counts and tested with a baseline request.

This is greatly affecting our prod, as users now have to wait 30+ seconds - in worst cases 1-2 minutes - instead of 5 seconds for a response.

Model string: projects/{Project ID}/locations/us-west1/endpoints/{Endpoint ID}

Model ID: {Model ID}

Endpoint ID: {Endpoint ID}

Traffic: Pay-as-you-go

Prediction type: Online predictions

Time of latency: Starting around November 19, 2025, between 7:55 PM UTC and 9:25 PM UTC. Continuing until present (3 days and counting).

Tokens per query (average): 900 Input, 500 Output. This has been stable throughout and is not the source of the latency.

Associated Errors: None

No response on support ticket for 8 days now.

Enterprises have poured thousands of dollars into creating finetuned task-specific models specifically for low latency performance in end-user facing applications, yet are now being stonewalled. Note that non-finetuned 2.5 Flash inference still has the exact same latency as before, including with the exact same prompts.