Preference tuning now available for Gemini 2.5 Flash models on Vertex AI

Hi everyone,

Preference Tuning (DPO) is now supported for Gemini 2.5 Flash and Flash-Lite on Vertex AI. And I want to share a new tutorial for those of you looking to align Gemini models to specific user preferences.

How it works:

  • You provide a JSONL file containing prompts with paired responses: one chosen (preferred) and one rejected.
  • The model adjusts its internal probability distributions to increase the likelihood of the preferred output format without needing a separate reward model.
  • We recommend a two-step approach: Tune with SFT first for preferred responses, then continue tuning from that checkpoint with DPO to refine the behavior.

Here you can find the notebook and the documentation.

Happy building!