The reason you see intermittent behavior: There is a cache, and it is per-server.
The Edge service is “serverless”, from your point of view. You don’t configure servers, you don’t worry about servers, though there are actual discrete servers running the code behind OAuthV2/VerifyAccessToken.
What happens is this:
in step 2, the server that receives your request with access_token (let’s call it “generation 1” = g1) reads from the token store and caches the result. Because the g1 token is valid and not expired, VerifyAccessToken succeeds. The cache now contains that token .
If you were to present the g1 token again for verification VAT would read directly from cache, and would see that the token is valid and not expired.
In step 3, you use RefreshAccessToken. This delivers a new access token (g2) and invalidates the previous g1 access token in the token store. Important: it does not invalidate the token cache.
Presenting g2 token to VAT, it succeeds.
Presenting g1 token to VAT, it may succeed or fail, depending on which server handles the request. If a server that has previously seen the token handles the request, it will read from cache, and if the cache is not expired (usually 180 seconds) , then the token will be treated as valid. If the cache has expired (180 s after first sight of the token) , then the server will read the persistent token store and see that the token is no longer valid. If the server has never before seen the token, then the cache is empty and the server will likewise read the token store and return a result indicating that the g1 token is not valid.
Does this make sense to you?
I think it might be nice if the RefreshAccessToken, if done on the same server, would invalidate the server-local cache of the existing (g1) token. This would eliminate some of the “false approvals” when the token is verified by a server with a stale cache. But it would not eliminate all of them. Imagine there are three servers, and the cache is warm on 2 of them. You invoke a proxy with RefreshAccessToken on one of the 2.
With the current behavior, I believe both of the 2 servers that had previously seen the token will have a stale cache, and will validate the g1 token for the next 180s (or less)
With the proposed modified behavior, the server that handles the RefreshAccessToken would have a clean cache, and would treat the g1 token as invalid (Correctly). The other of the 2 servers would have a stale cache, and would treat the g1 token as valid for the next 180s (or less).
Therefore this imagined change does not solve the problem completely. For that reason I think it’s probably not worth making the change.
You might think it would be nice if the cache were completely invisible - that when calling RefreshAccessToken on one of the servers, all of the servers would have their caches invalidated for that particular token. That’s a good goal, but for scalability reasons, it hasn’t been possible. Imagine not 3 servers but 40. The token invalidation notifications can be messy. In such a scenario the cache will have a window of staleness.
I hope this clarifies.