Subject: Urgent: Deployed Vertex AI Tuned Model Consistently Fails with “FAILED_PRECONDITION” Error
Hello Google Cloud Community,
I am writing to report a critical issue with a tuned model deployed on a Vertex AI Endpoint. For the past two days, I have been unable to get any successful predictions from my deployed model, which is completely blocking my development workflow.
I have conducted extensive and systematic debugging, including deleting and recreating the endpoint, and every attempt has failed with contradictory error messages. I am confident the issue is not with the API call itself, but with the state of the deployed model endpoint on the platform.
Asset Details:
-
Project ID: (PII Removed by Staff)
-
Region: us-central1
-
Current (New) Endpoint ID: 3825030528730398720
-
Tuned Model Name: 超能力250 (Superpower 250)
-
Base Model Used for Tuning: gemini-1.0-pro-002 (Please verify if this is correct)
-
Tuning Data Format: JSONL, with each line formatted as {“contents”: [{“role”: “user”, “parts”: […]}, {“role”: “model”, “parts”: […]}]}
Summary of Debugging Steps and Failures:
We have methodically tested every possible payload structure against the endpoint. The results are a logical paradox, strongly indicating a server-side issue.
1. Test with Standard prompt Payload:
We sent a standard, documented payload using cURL to the new, correctly identified endpoint (…398720).
COMMAND:
Generated code
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d '{"instances": [{"prompt": "Hello, world!"}]}' \
"https://us-central1-aiplatform.googleapis.com/v1/projects/gen-lang-client-0656296523/locations/us-central1/endpoints/3825030528730398720:predict"
Use code with caution.
RESULT:
Generated code
{
"error": {
"code": 400,
"message": "Precondition check failed.",
"status": "FAILED_PRECONDITION"
}
}
Use code with caution.
2. Test with contents Payload (without instances wrapper):
We hypothesized that the instances wrapper was incorrect.
COMMAND:
Generated code
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d '{"contents":[{"role":"user","parts":[{"text":"Hello, world!"}]}]}' \
"https://us-central1-aiplatform.googleapis.com/v1/projects/gen-lang-client-0656296523/locations/us-central1/endpoints/3825030528730398720:predict"
Use code with caution.
RESULT:
Generated code
{
"error": {
"code": 400,
"message": "Invalid JSON payload received. Unknown name \"contents\": Cannot find field.",
"status": "INVALID_ARGUMENT"
}
}
Use code with caution.
Analysis: The contradictory results prove that the instances wrapper is mandatory, but the model rejects any content inside it.
Conclusion:
This logical loop, persisting even after completely deleting and redeploying the model to a new endpoint, strongly suggests a persistent configuration corruption in the tuned model asset itself or a bug in the Vertex AI deployment process for this specific model type.
Could the community or any Google engineers here offer insight into why a newly deployed endpoint would consistently fail with FAILED_PRECONDITION on a standard payload?
I have attached all relevant screenshots from the Cloud Shell detailing these failed cURL attempts. Thank you for your time and help.