I am writing to inquire about the following two issues encountered while using the Vertex AI Evaluation feature.
1. 429 RESOURCE_EXHAUSTED Errors During Vertex AI Evaluation (Judge model resource exhausted)
Issue Description
When running evaluations using the Vertex AI Python SDK Evaluation API, the following error intermittently occurs for some evaluation cases:
429 RESOURCE_EXHAUSTED
Judge model resource exhausted. Please try again later.
Within the same evaluation request, some cases are evaluated successfully, while others fail with score and explanation remaining None.
In the summary metrics, these failures are counted as error=1.
Environment
-
Location: global
-
Authentication: Application Default Credentials (user credentials)
-
API used:
Client().evals.evaluate(...) -
Metric used:
RubricMetric.GENERAL_QUALITY
Mitigation Attempts
-
Implemented batch-level retry logic in addition to the SDK’s default retry behavior
-
However, when retries are enabled, processing time exceeds 1 minute per sample, which is not practical for real-world usage
Questions
In situations where judge model calls partially fail with 429 errors, is there a recommended approach to ensure stable evaluation without relying heavily on retries?
Additionally, when attempting to increase the relevant quota, the Edit quota option is disabled on the Quotas page.
Could you please advise on the correct procedure to request a quota increase for judge model usage in this case?
2. INVALID_ARGUMENT Error Related to tool_usage When Using TOOL_USE_QUALITY Metric
Issue Description
When running evaluations with RubricMetric.TOOL_USE_QUALITY, the evaluation fails with the following error:
400 INVALID_ARGUMENT
Error rendering metric prompt template:
Variable tool_usage is required but not provided.
Details
-
According to the official documentation, the
tool_use_quality_v1metric requires the following inputs:
prompt,developer_instruction,tool_declarations, andintermediate_events,
andtool_usageis not documented as a required input. -
However, during execution, the server-side evaluation logic appears to require the
{tool_usage}variable, resulting in the error above. -
This issue persists even when:
-
intermediate_eventsare provided in Gemini-compatible function call / function response format -
The client is initialized with
HttpOptions(api_version="v1beta1") -
The metric is explicitly pinned as
RubricMetric.TOOL_USE_QUALITY(version="v1")
-
Questions
-
Is it expected behavior that the
TOOL_USE_QUALITYmetric internally relies on a legacy prompt-template path requiring{tool_usage}? -
If
tool_usageis indeed required, could you please provide the officially supported schema and a concrete example? -
Is the documented
intermediate_events-based usage insufficient for successfully running this metric at the moment?
We would appreciate any root cause analysis, official guidance, or recommended workarounds.
Thank you for your support.