gemini-2.5-flash-preview-05-20 support batch predications now (2025-05-21), doc is here:
Gemini 2.5 Flash | Generative AI on Vertex AI | Google Cloud
I want to use gemini-2.5-flash batch api (Python) in Vertex AI, I want to turn off thinking (thinkbudget=0), How to turn it off in config?
# Turn off thinking
response = client.models.generate_content(
model="gemini-2.5-flash-preview-05-20"
contents="What is AI?",
config=GenerateContentConfig(
thinking_config=ThinkingConfig(
thinking_budget=0,
)
),
)
You can add config in your batch input.
Please see more details about thinking config in https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_2_5_flash.ipynb
vertex gemini batch api example
from google import genai
from google.genai.types import CreateBatchJobConfig, JobState, HttpOptions
client = genai.Client(http_options=HttpOptions(api_version="v1"))
# See the documentation: https://googleapis.github.io/python-genai/genai.html#genai.batches.Batches.create
job = client.batches.create(
model="gemini-2.5-flash-preview-05-20",
# Source link: https://storage.cloud.google.com/cloud-samples-data/batch/prompt_for_batch_gemini_predict.jsonl
src="gs://cloud-samples-data/batch/prompt_for_batch_gemini_predict.jsonl",
config=CreateBatchJobConfig(dest=output_uri),
)
it is using CreateBatchJobConfig, it has same thinkingconfig as GenerateContentConfig ?
GenerationConfig should be in input JSONL or the BigQuery table. See examples here: https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/batch-prediction/intro_batch_prediction.ipynb
Note that Gemini 2.5 Flash does not yet support batch prediction.
Can you please help with how to turn thinking off in the recently launched batch api mode ?
Iāve raised an issue on google-cookbook about this but no responses yet.
xyzpoka:
CreateBatchJobConfig
@psycic03 I missed your message eariler. I hope that you have resolved this issue.
The key is to include the generation_config with the thinking_budget=0 in the batch of input data.
@ericdong
āgenerationConfigā: {āresponse_mime_typeā: āapplication/jsonā, ātemperatureā: 0.0, ātop_pā: 0, āmax_output_tokensā: 512, āthinking_budgetā: 0}
**This is throwing an error:** āfieldViolationsā: [{āfieldā: āgeneration_configā, ādescriptionā: āInvalid JSON payload received. Unknown name \āthinking_budget\ā at āgeneration_configā: Cannot find field.ā}]}
Awesome, this is working great. We have to pass it as a string instead of a ThinkingConfig object to work in ā.jsonlā format
generation_config = {āthinking_configā: {āthinking_budgetā: 0} }
1 Like