Hi all,
I’m trying to use Priority PayGo on Vertex AI, but every request is silently
downgraded to Standard: usageMetadata.trafficType returns ON_DEMAND instead of
ON_DEMAND_PRIORITY. HTTP is 200, no error.
SETUP
- Surface: Vertex AI (aiplatform.googleapis.com), auth via ADC
- Global endpoint:
https://aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/global/publishers/google/models/gemini-2.5-pro:generateContent - Header: X-Vertex-AI-LLM-Shared-Request-Type: priority
(also tested adding X-Vertex-AI-LLM-Request-Type: shared to force shared/priority only)
REPRO (raw curl)
curl -X POST
-H “Authorization: Bearer $(gcloud auth print-access-token)”
-H “Content-Type: application/json”
-H “X-Vertex-AI-LLM-Request-Type: shared”
-H “X-Vertex-AI-LLM-Shared-Request-Type: priority”
“https://aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/global/publishers/google/models/gemini-2.5-pro:generateContent”
-d ‘{“contents”:[{“role”:“user”,“parts”:[{“text”:“ping”}]}]}’
=> HTTP 200, usageMetadata.trafficType = “ON_DEMAND” (expected: ON_DEMAND_PRIORITY)
ALREADY RULED OUT
- The header IS actually on the wire (verified at the HTTP/fetch layer, not just in SDK config).
- Same result via raw curl, the official google-genai Python SDK (vertexai=True, header in http_options), and LangChain.
- Tested global AND regional endpoints (us-central1, europe-west4); api_version v1 AND v1beta1.
- Tested multiple priority-capable models (gemini-2.5-pro, gemini-3-flash-preview, gemini-3.1-pro-preview) → always ON_DEMAND.
- Added X-Goog-User-Project → still HTTP 200, NO 429 (so it’s not a quota-exhausted case).
- Project is under a GCP Organization (not standalone); billing account is enabled and open, direct (not reseller-managed).
KEY CONTRAST
The SAME priority works on the Gemini Developer API (generativelanguage.googleapis.com)
with the same account’s API key, using “service_tier”: “priority” in the body
→ response header x-gemini-service-tier: priority.
So the account is clearly eligible; only Vertex AI doesn’t apply it.
QUESTION
Why does Vertex AI silently downgrade priority to ON_DEMAND when the header is correct
and the endpoint is global? Is there a project/org-level enablement, billing tier, or
ramp allocation required beyond sending the header? How do I get ON_DEMAND_PRIORITY on
Vertex for my project — is there any self-service step, or does it require manual backend
enablement by Google?
Thanks!