Performance degradation when using Batch prediction

asifimran · March 21, 2025, 6:09pm

I am using Gemini’s batch prediction to estimate bounding boxes from a large number of images.

I have since discovered that the same prompts (and same other settings) yield drastically (worsened) performance when using the batch API compared to vanilla vertexai’ chat api.

I have tried gemini-flash-2.0-001, gemini-2.0-flash-lite-001, gemini-2.0-pro-exp-02-05

I am wondering if anyone else has run into similar issues?

MJane · March 25, 2025, 6:06pm

Hi @asifimran,

Welcome to the Google Cloud Community!

It seems you are experiencing inconsistencies when estimating bounding boxes using the batch API and the chat API. Here potential steps to help you investigate or address the issue:

Configuration Differences - Outputs may still vary due to differences in how APIs handle data, even with identical prompts and settings. It is crucial to review the API documentation for any batch-specific nuances that could impact results.
Pre-processing of Inputs - Data alterations can sometimes be unintentionally introduced by batch processing pipelines. Ensure that data is pre-processed consistently before being sent to the APIs.
Testing with Smaller Subsets - Test a small subset of your images via both the batch API and the chat API to directly compare results. This can help isolate where the performance gap might be occurring.
API Logs and Metrics - Check the logs and metrics from the batch API to see if there are any errors, warnings, or other indicators of why the performance might be dropping.

For more information about batch processing you can read this documentation.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

Dawidwk · April 8, 2025, 6:51pm

I’m having similar issues, still don’t know how to fix it.

when processing request one by one works fine:

client.models.generate_content(
            model="gemini-2.0-flash",
            contents=prompt,
            config={
                "response_mime_type": "application/json",
                "response_schema": RESPONSE_SCHEMA,
            },
        ),

but processing file with batch predictions, gemini output breaks. Some output is not even in valid schema, and gemini is using Grounding with Google Search tool for some reason.

# THIS IS HOW I PREPARE FILE
with open("input.jsonl", "w", encoding="utf-8") as f:
    for judgment in judgments:
        judgment_dict = {
            "id": judgment["id"],
            "request": {
                "contents": [
                    {
                        "parts": {
                            "text": prompt.create_prompt(judgment["text_content"])
                        },
                        "role": "user",
                    }
                ],
                "generationConfig": (
                    response_mime_type="application/json",
                    response_schema=JudgmentAnalysisPrompt.RESPONSE_SCHEMA,
                ),
            },
        }

# THIS IS HOW I'M SENDING BATCH REQUESWT
job = client.batches.create(
    model="gemini-2.0-flash-001",
    src="gs://.../test/input/input.jsonl",
    config=CreateBatchJobConfig(
        dest="gs://.../test/output/",
    ),
)

I can’t find any reasonable documentation for batch vertex ai usage

Pranav_Srinivasan · February 19, 2026, 8:40am

Did you get a solution to this?

Even today the performance using batch API is just objectively worse compared to the normal API calls even when all the parameters are explicitly set to the same values in both cases. (to mitigate the possibility that the default values are different in both the cases)

Topic		Replies	Views
Gemini Batch API Performance Issue - Slow Processing Custom ML & MLOps gemini-in-looker , vertex-ai-platform	1	421	April 4, 2025
Gemini Batch API image generation fails at 2K resolution but works fine at 1K Generative AI & Foundational Models gemini	7	99	March 16, 2026
Vertex AI batch prediction for Gemini 1.5 Pro model is very slow Custom ML & MLOps vertex-ai-platform	0	266	December 5, 2024

Performance degradation when using Batch prediction

AI Suggested topics