Turn off thinking for gemini-2.5-flash batch api in Vertex AI

gemini-2.5-flash-preview-05-20 support batch predications now (2025-05-21), doc is here:

Gemini 2.5 Flash | Generative AI on Vertex AI | Google Cloud

I want to use gemini-2.5-flash batch api (Python) in Vertex AI, I want to turn off thinking (thinkbudget=0), How to turn it off in config?

# Turn off thinking

response = client.models.generate_content(
    model="gemini-2.5-flash-preview-05-20"
    contents="What is AI?",
    config=GenerateContentConfig(
        thinking_config=ThinkingConfig(
            thinking_budget=0,
        )
    ),
)

You can add config in your batch input.
Please see more details about thinking config in https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_2_5_flash.ipynb

vertex gemini batch api example

from google import genai
    from google.genai.types import CreateBatchJobConfig, JobState, HttpOptions

    client = genai.Client(http_options=HttpOptions(api_version="v1"))

    # See the documentation: https://googleapis.github.io/python-genai/genai.html#genai.batches.Batches.create
    job = client.batches.create(
        model="gemini-2.5-flash-preview-05-20",
        # Source link: https://storage.cloud.google.com/cloud-samples-data/batch/prompt_for_batch_gemini_predict.jsonl
        src="gs://cloud-samples-data/batch/prompt_for_batch_gemini_predict.jsonl",
        config=CreateBatchJobConfig(dest=output_uri),
    )

it is using CreateBatchJobConfig, it has same thinkingconfig as GenerateContentConfig ?

GenerationConfig should be in input JSONL or the BigQuery table. See examples here: https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/batch-prediction/intro_batch_prediction.ipynb

Note that Gemini 2.5 Flash does not yet support batch prediction.

Can you please help with how to turn thinking off in the recently launched batch api mode?

I’ve raised an issue on google-cookbook about this but no responses yet.

@psycic03 I missed your message eariler. I hope that you have resolved this issue.
The key is to include the generation_config with the thinking_budget=0 in the batch of input data.

@ericdong

ā€œgenerationConfigā€: {ā€œresponse_mime_typeā€: ā€œapplication/jsonā€, ā€œtemperatureā€: 0.0, ā€œtop_pā€: 0, ā€œmax_output_tokensā€: 512, ā€œthinking_budgetā€: 0}

**This is throwing an error:** ā€œfieldViolationsā€: [{ā€œfieldā€: ā€œgeneration_configā€, ā€œdescriptionā€: ā€œInvalid JSON payload received. Unknown name \ā€œthinking_budget\ā€ at ā€˜generation_config’: Cannot find field.ā€}]}

@siva_krishna thinking_budget is in thinking_config of generation_config. Can you see if this notebook helps: generative-ai/gemini/getting-started/intro_gemini_2_5_flash.ipynb at main Ā· GoogleCloudPlatform/generative-ai Ā· GitHub ? Thanks.

Awesome, this is working great. We have to pass it as a string instead of a ThinkingConfig object to work in ā€œ.jsonlā€ format

generation_config = {ā€˜thinking_config’: {ā€œthinking_budgetā€: 0} }

1 Like