Hi everyone,
I’m working on a Flask application deployed on Google Cloud Run that uses Vertex AI via the langchain_google_vertexai.ChatVertexAI module. The application is containerized and works perfectly when I test it in Google Cloud Shell or run it locally using my own credentials.
However, when deployed in Cloud Run, the app only works for the first request after deployment. Any subsequent requests result in errors like:
ValueError: Could not resolve project_id
Here’s a summary of my setup:
I’ve set the correct project and location parameters in ChatVertexAI.
I’m using the default Cloud Run service account, which has the Vertex AI User role.
My generate() function creates a fresh LLM and agent for each request.
The first request always succeeds, but any follow-up requests fail (unless I wait several minutes, then it works once again).
? Logs point to issues in litellm token fetching: “Could not resolve project_id”, even though it’s passed explicitly.
It seems like either:
Token refresh or credential caching is broken in Cloud Run,
The default service account doesn’t work well with Vertex AI in this setup,
Or concurrency/memory limits are causing unexpected failures.
I’ve already tried setting GOOGLE_CLOUD_PROJECT, manually refreshing credentials, and verifying the project ID. None of these fixed the issue.
Has anyone faced this issue with LangChain + Vertex AI + Cloud Run?
What’s the best way to make this stable in production?