Turn off thinking for gemini-2.5-flash batch api in Vertex AI

xyzpoka · May 21, 2025, 1:41pm

gemini-2.5-flash-preview-05-20 support batch predications now (2025-05-21), doc is here:

Gemini 2.5 Flash | Generative AI on Vertex AI | Google Cloud

I want to use gemini-2.5-flash batch api (Python) in Vertex AI, I want to turn off thinking (thinkbudget=0), How to turn it off in config?

ericdong · May 21, 2025, 1:57pm

# Turn off thinking

response = client.models.generate_content(
    model="gemini-2.5-flash-preview-05-20"
    contents="What is AI?",
    config=GenerateContentConfig(
        thinking_config=ThinkingConfig(
            thinking_budget=0,
        )
    ),
)

You can add config in your batch input.
Please see more details about thinking config in https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_2_5_flash.ipynb

xyzpoka · May 21, 2025, 2:12pm

vertex gemini batch api example

from google import genai
    from google.genai.types import CreateBatchJobConfig, JobState, HttpOptions

    client = genai.Client(http_options=HttpOptions(api_version="v1"))

    # See the documentation: https://googleapis.github.io/python-genai/genai.html#genai.batches.Batches.create
    job = client.batches.create(
        model="gemini-2.5-flash-preview-05-20",
        # Source link: https://storage.cloud.google.com/cloud-samples-data/batch/prompt_for_batch_gemini_predict.jsonl
        src="gs://cloud-samples-data/batch/prompt_for_batch_gemini_predict.jsonl",
        config=CreateBatchJobConfig(dest=output_uri),
    )

it is using CreateBatchJobConfig, it has same thinkingconfig as GenerateContentConfig ?

ericdong · May 21, 2025, 4:30pm

GenerationConfig should be in input JSONL or the BigQuery table. See examples here: https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/batch-prediction/intro_batch_prediction.ipynb

Note that Gemini 2.5 Flash does not yet support batch prediction.

psycic03 · July 11, 2025, 6:15am

Can you please help with how to turn thinking off in the recently launched batch api mode?

I’ve raised an issue on google-cookbook about this but no responses yet.

ericdong · July 29, 2025, 5:22pm

@psycic03 I missed your message eariler. I hope that you have resolved this issue.
The key is to include the generation_config with the thinking_budget=0 in the batch of input data.

siva_krishna · September 17, 2025, 1:15pm

@ericdong

“generationConfig”: {“response_mime_type”: “application/json”, “temperature”: 0.0, “top_p”: 0, “max_output_tokens”: 512, “thinking_budget”: 0}

**This is throwing an error:** “fieldViolations”: [{“field”: “generation_config”, “description”: “Invalid JSON payload received. Unknown name \“thinking_budget\” at ‘generation_config’: Cannot find field.”}]}

ericdong · September 17, 2025, 1:32pm

@siva_krishna thinking_budget is in thinking_config of generation_config. Can you see if this notebook helps: generative-ai/gemini/getting-started/intro_gemini_2_5_flash.ipynb at main · GoogleCloudPlatform/generative-ai · GitHub ? Thanks.

siva_krishna · September 17, 2025, 5:28pm

Awesome, this is working great. We have to pass it as a string instead of a ThinkingConfig object to work in “.jsonl” format

generation_config = {‘thinking_config’: {“thinking_budget”: 0} }

Topic		Replies	Views
How can I set 0 "thinkingBudget" with Vertex AI Gemini 2.5Flash？ I wanna disable it Custom ML & MLOps vertex-ai-platform	5	666	September 24, 2025
Performance degradation when using Batch prediction Custom ML & MLOps gemini-in-looker , vertex-ai-platform	2	203	April 8, 2025
vertex ai Gemini 1.5 batch prediction with multi turn Custom ML & MLOps vertex-ai-platform	2	172	September 5, 2024

Turn off thinking for gemini-2.5-flash batch api in Vertex AI

AI Suggested topics