Vertex AI Evaluation Issues (429 RESOURCE_EXHAUSTED and TOOL_USE_QUALITY Input Schema Error)

I am writing to inquire about the following two issues encountered while using the Vertex AI Evaluation feature.


1. 429 RESOURCE_EXHAUSTED Errors During Vertex AI Evaluation (Judge model resource exhausted)

Issue Description

When running evaluations using the Vertex AI Python SDK Evaluation API, the following error intermittently occurs for some evaluation cases:

429 RESOURCE_EXHAUSTED
Judge model resource exhausted. Please try again later.

Within the same evaluation request, some cases are evaluated successfully, while others fail with score and explanation remaining None.
In the summary metrics, these failures are counted as error=1.

Environment

  • Location: global

  • Authentication: Application Default Credentials (user credentials)

  • API used: Client().evals.evaluate(...)

  • Metric used: RubricMetric.GENERAL_QUALITY

Mitigation Attempts

  • Implemented batch-level retry logic in addition to the SDK’s default retry behavior

  • However, when retries are enabled, processing time exceeds 1 minute per sample, which is not practical for real-world usage

Questions

In situations where judge model calls partially fail with 429 errors, is there a recommended approach to ensure stable evaluation without relying heavily on retries?

Additionally, when attempting to increase the relevant quota, the Edit quota option is disabled on the Quotas page.
Could you please advise on the correct procedure to request a quota increase for judge model usage in this case?


2. INVALID_ARGUMENT Error Related to tool_usage When Using TOOL_USE_QUALITY Metric

Issue Description

When running evaluations with RubricMetric.TOOL_USE_QUALITY, the evaluation fails with the following error:

400 INVALID_ARGUMENT
Error rendering metric prompt template:
Variable tool_usage is required but not provided.

Details

  • According to the official documentation, the tool_use_quality_v1 metric requires the following inputs:
    prompt, developer_instruction, tool_declarations, and intermediate_events,
    and tool_usage is not documented as a required input.

  • However, during execution, the server-side evaluation logic appears to require the {tool_usage} variable, resulting in the error above.

  • This issue persists even when:

    • intermediate_events are provided in Gemini-compatible function call / function response format

    • The client is initialized with HttpOptions(api_version="v1beta1")

    • The metric is explicitly pinned as RubricMetric.TOOL_USE_QUALITY(version="v1")

Questions

  1. Is it expected behavior that the TOOL_USE_QUALITY metric internally relies on a legacy prompt-template path requiring {tool_usage}?

  2. If tool_usage is indeed required, could you please provide the officially supported schema and a concrete example?

  3. Is the documented intermediate_events-based usage insufficient for successfully running this metric at the moment?


We would appreciate any root cause analysis, official guidance, or recommended workarounds.

Thank you for your support.