Vertex AI Priority PayGo silently downgraded to ON_DEMAND (trafficType) despite correct header — works on Gemini Developer API, same account

Alessio_Ballabio · June 16, 2026, 7:26pm

Hi all,

I’m trying to use Priority PayGo on Vertex AI, but every request is silently
downgraded to Standard: usageMetadata.trafficType returns ON_DEMAND instead of
ON_DEMAND_PRIORITY. HTTP is 200, no error.

SETUP

Surface: Vertex AI (aiplatform.googleapis.com), auth via ADC
Global endpoint:
https://aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/global/publishers/google/models/gemini-2.5-pro:generateContent
Header: X-Vertex-AI-LLM-Shared-Request-Type: priority
(also tested adding X-Vertex-AI-LLM-Request-Type: shared to force shared/priority only)

REPRO (raw curl)
curl -X POST
-H “Authorization: Bearer $(gcloud auth print-access-token)”
-H “Content-Type: application/json”
-H “X-Vertex-AI-LLM-Request-Type: shared”
-H “X-Vertex-AI-LLM-Shared-Request-Type: priority”
“https://aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/global/publishers/google/models/gemini-2.5-pro:generateContent”
-d ‘{“contents”:[{“role”:“user”,“parts”:[{“text”:“ping”}]}]}’

=> HTTP 200, usageMetadata.trafficType = “ON_DEMAND” (expected: ON_DEMAND_PRIORITY)

ALREADY RULED OUT

The header IS actually on the wire (verified at the HTTP/fetch layer, not just in SDK config).
Same result via raw curl, the official google-genai Python SDK (vertexai=True, header in http_options), and LangChain.
Tested global AND regional endpoints (us-central1, europe-west4); api_version v1 AND v1beta1.
Tested multiple priority-capable models (gemini-2.5-pro, gemini-3-flash-preview, gemini-3.1-pro-preview) → always ON_DEMAND.
Added X-Goog-User-Project → still HTTP 200, NO 429 (so it’s not a quota-exhausted case).
Project is under a GCP Organization (not standalone); billing account is enabled and open, direct (not reseller-managed).

KEY CONTRAST
The SAME priority works on the Gemini Developer API (generativelanguage.googleapis.com)
with the same account’s API key, using “service_tier”: “priority” in the body
→ response header x-gemini-service-tier: priority.
So the account is clearly eligible; only Vertex AI doesn’t apply it.

QUESTION
Why does Vertex AI silently downgrade priority to ON_DEMAND when the header is correct
and the endpoint is global? Is there a project/org-level enablement, billing tier, or
ramp allocation required beyond sending the header? How do I get ON_DEMAND_PRIORITY on
Vertex for my project — is there any self-service step, or does it require manual backend
enablement by Google?

Thanks!

Miqua · June 17, 2026, 5:35pm

A post was merged into an existing topic: Google Skills Arcade 2026 Tiers

Topic		Replies	Views
We are getting a lot of 429 Errors calling the Vertex API, (paid) Priority tier is not respected Generative AI & Foundational Models gemini	1	17	June 17, 2026
Gemini 3.1 Flash Image started billing at unsupported "Priority PayGo" SKU (Cost doubling by surprise) AI APIs agent-platform-vision	4	165	March 31, 2026
Provisioned Throughput for Gemini Custom ML & MLOps gemini-in-looker , agent-platform	0	320	December 6, 2024

Vertex AI Priority PayGo silently downgraded to ON_DEMAND (trafficType) despite correct header — works on Gemini Developer API, same account

AI Suggested topics