Using Vertex AI with OpenAI‑Compatible Endpoints and Enforcing US‑Only

Hi everyone,

I’m trying to understand whether it’s possible to use the Vertex AI Gemini API through the OpenAI‑compatible interface while ensuring that all requests are processed exclusively within US regions.

My understanding is:

  • When using the Vertex AI Gemini API directly (via Vertex endpoints), I can select a regional endpoint-for example, a US region-to ensure that requests are routed and processed there.

  • However, when using a Gemini API key from Google AI Studio, requests are routed through the global endpoint, which does not guarantee US‑only processing.

What I’d like to confirm is:

If my application is configured to use OpenAI‑compatible endpoints (per: https://ai.google.dev/gemini-api/docs/openai),

is there a way to force all requests to go to a US‑based Vertex AI regional endpoint only?

Or, put differently:

  • Is regional routing supported when using OpenAI compatibility for Vertex AI?

  • Or is it not possible, meaning I would need to use the native Vertex AI client/endpoint configuration instead of the OpenAI‑compatible interface?

I would really appreciate any guidance or best practices on how to achieve US‑only request routing in this OpenAI- setup.

Thank you in advance for your insights and suggestions

Hello @Ujwal_Shah,

The Vertex AI documentation page about OpenAI compatibility may be more accurate for your needs. There is a snippet that shows how to configure a specific region.

import openai
from google.auth import default
import google.auth.transport.requests

# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
# location = "global"

# Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())

# OpenAI Client
client = openai.OpenAI(
  base_url=f"https://aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
  api_key=credentials.token
)

response = client.chat.completions.create(
  model="google/gemini-2.0-flash-001",
  messages=[
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain to me how AI works"}
  ]
)

print(response.choices[0].message)
2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.