Peculiar Pricing of Context Caching and Potential Plans for Prefix Caching Support

For instance, the standard request price for Gemini 1.5 Pro is $1.25 per million tokens.

Then, the price for context caching is $4.50 per million tokens per hour. Cached input requests (cache hits) are priced at $0.31 per million tokens.

This creates a peculiar situation. Instead of making a standard request at $1.25 per million tokens, I could first create a 1-minute cache. For a request of, say, 1 million tokens, this 1-minute cache would cost $4.50 / 60 = $0.075. Then, if I make the request, it will definitely be a cache hit, costing $0.31. The total cost would be $0.385. This is significantly lower than the standard request price. It implies that I could make all my requests much cheaper by first creating a 1-minute cache and then making the actual request.

Secondly, I find the current implementation of context caching very difficult to use, especially in multi-turn sessions. My goal is to have the user’s input and the AI’s response added to a new cache after each turn.

In Anthropic’s Claude, this can be achieved by setting a breakpoint at the latest message, which automatically caches the preceding conversation. I believe this approach is much more flexible. Are there any plans to support this kind of prefix/prompt caching?

Hi Junity,

Welcome to Google Cloud Community!

Explicit caching lets you upload content once and reuse it across requests. You pay:

  • $4.50 per million tokens per hour for storage
  • $0.31 per million tokens for cache hits

Standard input tokens cost $1.25 per million. Therefore, yes, caching + hit = ~$0.385 per million tokens, a 70% discount

However, this assumes that you always hit the cache (which isn’t guaranteed unless you tightly control prompt structure and timing). Your cached content is large enough to justify the overhead. You’re not incurring extra costs from non-cached tokens or output tokens

In addition, Gemini’s caching isn’t optimized for dynamic, evolving conversations. It treats cached content as a static prefix, and there’s no built-in way to append new turns to the cache after each exchange.

By contrast, Anthropic’s Claude prompt caching allows you to set a cache_control breakpoint at any point in the prompt. This means:

  • You can cache everything up to the latest message
  • The cache is refreshed automatically when reused
  • It’s ideal for chat-style interactions where context grows turn by turn

As of now, there’s no public roadmap confirming Gemini will adopt Claude-style prefix caching. I suggest keeping an eye to release notes for future updates.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

Isn’t it context caching charged by hour basis?

Edit: Oh wait, I saw 5 minute ttl on the documentation.

Here are my thoughts.

  1. Your first request will not be count as cached, so it will be normal price as far as I know. Then it will save the state with caching mechanism
  2. Are there any settings to keep the cache alive? even though we’re giving ttl → 1 hour. why not if I hit at 55 min, refresh the cache or don’t kill it? As you stated Anthropic uses this way, does Gemini has this feature? @ruthseki thx!