Support Needed: Discrepancy in Vertex AI Context Caching (Node.js vs. Python)

lovee93 · March 8, 2025, 3:29am

Hi team,

I’ve been experimenting with Vertex AI function calling and the preview feature, context caching. While following the documentation and Python examples (codelabs), everything seems to work as expected. However, when implementing the same in Node.js, I’ve noticed a major discrepancy, particularly with context caching.

When using the example PDF URIs from the codelab, the token count in Python meets the expected minimum, but in Node.js, the token count is significantly lower. Am I missing something here? I am also attaching some screenshots for the reference
From the documentation, my understanding is that context caching should improve response times by retrieving data from cache instead of making external API calls. However, in my tests, I am seeing longer response times instead of improvements. Is my assumption incorrect?

Looking forward to your thoughts!

ibaui · March 13, 2025, 10:08pm

Hi @lovee93 ,

Welcome to Google Cloud Community!

The cached_content.create call fails in Node.js because the calculated token count falls below the required minimum for context caching. This is likely caused by a discrepancy in token counts between Python and Node.js, potentially due to differing tokenization algorithms in their respective Vertex AI SDKs.

Here are some potential reasons and suggestions you might consider to address the issue:

Incorrect Content Handling (Node.js): Check for incorrect handling of PDF content encoding in the Node.js client. Also, be aware of potential inconsistencies in PDF parsing (either by Vertex AI’s internal library during download/processing or by pre-processing, if applicable), as these can both contribute to inaccurate tokenization.
API Version/SDK Differences: Ensure you’re using the latest Vertex AI Node.js SDK, as older versions may contain bugs or inconsistencies that can exhibit differences between Python and Node.js.
Reproducibility and Isolation: To facilitate debugging, create a minimal Node.js example using a single, simple PDF URI. Moreover, ensure that both your Node.js and Python environments are as consistent as possible, sharing factors like operating system and network configuration.

You can also refer to the following documents for more details:

Vertex AI Generative AI Overview: Starting point for understanding Vertex AI’s generative capabilities. While it doesn’t specifically address the Node.js vs. Python issue, it sets the context.
Gemini API Overview
Using Context Caching
Token Counting
Google Cloud Node.js Client Libraries

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

lovee93 · March 14, 2025, 8:43pm

Thank you so much for replying and suggesting possible things to try on. I am currently using the latest version of the sdk which is 1.9.3 and have tried with a minimal example. The fact that is confusing me is that when I just use 1 PDF file, the token count is more while when I use 2 PDF files, the token count is less. Attaching screenshots for your reference:

Testing with 1 PDF file, the cached content is of 23258 tokens.

Testing with 2 PDF files, the cached content is of 19904 tokens

Here’s the repository with this example: https://github.com/Lovee93/context-caching-bug

Thank you!

Topic		Replies	Views
Vertex AI caching only system prompt Custom ML & MLOps gemini-in-looker , vertex-ai-platform	2	413	April 24, 2025
Error when use context caching with gemini-1.5-flash-001 Custom ML & MLOps vertex-ai-platform	2	124	July 3, 2024
Peculiar Pricing of Context Caching and Potential Plans for Prefix Caching Support Custom ML & MLOps vertex-ai-platform	3	374	September 1, 2025

Support Needed: Discrepancy in Vertex AI Context Caching (Node.js vs. Python)

AI Suggested topics