The dilemma you describe is an age old problem. When using a cache, there is a tradeoff between performance and accuracy. When applied in a distributed system, the accuracy challenge compounds, because there are multiple independent caches all of which maintain their own cached values.
You will have to evaluate how you want to balance that tradeoff. A cache expiry of 1 second will lead to a 1-second âmaximum window of inaccuracyâ for any given cache. The respective caches will still be inaccurate, but for no more than 1 second maximum, at a time. The downside of this is that the system will incur greater cost by fetching or computing values, because the cache might often be cold. Could you tolerate a wider window of inaccuracy? 10 seconds? 30 seconds? What happens when the token is stale? - could you have Apigee retry the upstream call, but refreshing its cache before doing so? Could you have the client retry?
In Apigee, there is no way to âjoinâ the disparate caches across all the nodes and synchronize them.
My suggestion to you is to try different approaches and measure the behavior for correctness and performance.
Tony Hoare and Donald Knuth have been known to say, Premature Optimization is the root of all evil. By invoking this aphorism, I am NOT suggesting that the performance of various options is irrelevant. Rather, I am suggesting that you MEASURE the effects, rather than assuming you know what the relative performance impacts will be. You wrote, âAlso, this creates lot of performance overhead.â How much? Compared to what? Are you sure? Have you measured?
Some other things to consider:
- You said the KVM gets updated every 5 minutes. Is it possible to design the system so that when you populate the KVM, you ALSO PUT to a cache? The KVM Get operation has an implicit cache, but, maybe you can wrap the Apigee Cache explicitly around the KVM. With the Apigee Cache you have more capability, you can invalidate an entry, or populate an entry explicitly, whenever you want. And you can Scope the cache to an API Proxy or to some wider scope.
- Is it possible to reduce the frequency of the KVM update?
- Is it possible to introduce some retry logic so that you TRY the token periodically and only refresh when the token is bad?
- Is there some other way to work around the problem?
You might think, âby taking any of these steps, Iâm just forcing the Apigee policies and services to behave in a way that is not harmonious with their design. I should just introduce another element, something like a Redis cache, that is designed for caching. That will solve my cache coherence problem more elegantly.â But thatâs probably not valid. Introducing an external cache, like a Redis or GCP MemoryStore, WILL provide a cache. But you will have the same issue, the same tradeoff of performance vs accuracy. You wouldnât be solving anything, youâd just be shifting the problem to a different element in the system. Youâd still have to measure, and experiment.