Urgent: Deployed Vertex AI Tuned Model Consistently Fails with "FAILED_PRECONDITION" Error

Subject: Urgent: Deployed Vertex AI Tuned Model Consistently Fails with “FAILED_PRECONDITION” Error

Hello Google Cloud Community,

I am writing to report a critical issue with a tuned model deployed on a Vertex AI Endpoint. For the past two days, I have been unable to get any successful predictions from my deployed model, which is completely blocking my development workflow.

I have conducted extensive and systematic debugging, including deleting and recreating the endpoint, and every attempt has failed with contradictory error messages. I am confident the issue is not with the API call itself, but with the state of the deployed model endpoint on the platform.

Asset Details:

  • Project ID: (PII Removed by Staff)

  • Region: us-central1

  • Current (New) Endpoint ID: 3825030528730398720

  • Tuned Model Name: 超能力250 (Superpower 250)

  • Base Model Used for Tuning: gemini-1.0-pro-002 (Please verify if this is correct)

  • Tuning Data Format: JSONL, with each line formatted as {“contents”: [{“role”: “user”, “parts”: […]}, {“role”: “model”, “parts”: […]}]}

Summary of Debugging Steps and Failures:

We have methodically tested every possible payload structure against the endpoint. The results are a logical paradox, strongly indicating a server-side issue.

1. Test with Standard prompt Payload:
We sent a standard, documented payload using cURL to the new, correctly identified endpoint (…398720).

COMMAND:

Generated code

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d '{"instances": [{"prompt": "Hello, world!"}]}' \
"https://us-central1-aiplatform.googleapis.com/v1/projects/gen-lang-client-0656296523/locations/us-central1/endpoints/3825030528730398720:predict"

Use code with caution.

RESULT:

Generated code

{
  "error": {
    "code": 400,
    "message": "Precondition check failed.",
    "status": "FAILED_PRECONDITION"
  }
}

Use code with caution.

2. Test with contents Payload (without instances wrapper):
We hypothesized that the instances wrapper was incorrect.

COMMAND:

Generated code

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d '{"contents":[{"role":"user","parts":[{"text":"Hello, world!"}]}]}' \
"https://us-central1-aiplatform.googleapis.com/v1/projects/gen-lang-client-0656296523/locations/us-central1/endpoints/3825030528730398720:predict"

Use code with caution.

RESULT:

Generated code

{
  "error": {
    "code": 400,
    "message": "Invalid JSON payload received. Unknown name \"contents\": Cannot find field.",
    "status": "INVALID_ARGUMENT"
  }
}

Use code with caution.

Analysis: The contradictory results prove that the instances wrapper is mandatory, but the model rejects any content inside it.

Conclusion:

This logical loop, persisting even after completely deleting and redeploying the model to a new endpoint, strongly suggests a persistent configuration corruption in the tuned model asset itself or a bug in the Vertex AI deployment process for this specific model type.

Could the community or any Google engineers here offer insight into why a newly deployed endpoint would consistently fail with FAILED_PRECONDITION on a standard payload?

I have attached all relevant screenshots from the Cloud Shell detailing these failed cURL attempts. Thank you for your time and help.