Build Fort Knox for GenAI: Defeat the "Confused Deputy" on Cloud Run

aniketagrawal · February 3, 2026, 10:33am

In the rush to build Generative AI agents—bots that check bank balances, approve loans, or book flights—developers often focus entirely on the “Brain” (the LLM) and neglect the “Body” (the infrastructure).

This leads to a classic yet dangerous vulnerability in microservices: The Confused Deputy.

If you secure your backend API by simply asking “Is this a valid Google Cloud Service Account?”, you are effectively leaving the door open. If an attacker steals a valid token meant for a different internal service (e.g., a non-sensitive log viewer), they can use that valid identity to unlock your production banking vault.

In this deep dive, we will deconstruct a Defense-in-Depth architecture on Google Cloud Run. We will simulate a “Red Team” attack using generic credentials and then implement the “Blue Team” fix: Audience (aud) Validation.

The Vulnerability: Identity is Not Enough

In a naive “Zero Trust” setup, developers often rely solely on Authentication (verifying who the caller is). However, for high-security workloads, we must also verify Intent (verifying where the caller intended to send the request).

Without verifying intent, a generic token issued for any service in your project can be replayed against your specific service.

Architecture Diagram: The Confused Deputy Attack

In this scenario, an attacker possesses a valid Google Cloud credential. However, the token they generate is generic (lacking a specific target). The Cloud Run IAM Proxy acts as a firewall, inspecting the token’s claims and rejecting it.

+-----------------------------------------------------------------------+
|  Google Cloud Project (VPC-SC Perimeter)                              |
|                                                                       |
|  +--------------+             +------------------------------------+  |
|  |   Attacker   |             |         Cloud Run Service          |  |
|  | (Valid User/ |             |       (Banking Vault API)          |  |
|  |  Service Acct)|            |                                    |  |
|  |              |             |   +----------------------------+   |  |
|  | [Generic JWT]------X------>|   | IAM Proxy (The Gatekeeper) |   |  |
|  |              |   (403)     |   | Check: aud == service_url? |   |  |
|  +--------------+             |   +----------------------------+   |  |
|                               +------------------------------------+  |
+-----------------------------------------------------------------------+
       ^
       |
(Public Internet - Blocked via --no-allow-unauthenticated)

Step 1: Deploying the “JWT-Aware” Vault

Let’s deploy a Python microservice that acts as our sensitive banking backend. We will deploy it with the --no-allow-unauthenticated flag, forcing all traffic to pass through Google’s IAM layer [^1].

Crucially, our app includes a /debug endpoint to inspect the incoming security tokens for educational purposes.

main.py

import os
import jwt
from flask import Flask, jsonify, request

app = Flask(__name__)

# The Sensitive Data
MORTGAGE_RATES = {
    "provider": "SOL-Secure-Backend",
    "tier": "INSTITUTIONAL_VIP",
    "rate": 5.25
}

@app.route('/rates')
def get_rates():
    # If requests reach here, Cloud Run has already validated the signature.
    return jsonify(MORTGAGE_RATES)

@app.route('/debug')
def debug_token():
    # Forensics: Let's look inside the token
    auth_header = request.headers.get('Authorization', '')
    token = auth_header.split(" ")[1]
    # Decode without verifying signature just for inspection
    decoded = jwt.decode(token, options={"verify_signature": False})
    return jsonify({
        "token_type": "JWT",
        "audience_claim": decoded.get("aud", "MISSING")
    })

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=8080)

Deployment:

gcloud run deploy banking-vault \
  --source . \
  --region us-central1 \
  --no-allow-unauthenticated

Step 2: The “Red Team” Audit

Imagine you are a developer with valid access to the Google Cloud project. You assume you can access the service because you are “authenticated.”

Attempt 1: The Access Token (OAuth2)

You try using the standard CLI access token typically used for managing resources (Control Plane).

TOKEN=$(gcloud auth print-access-token)
curl -H "Authorization: Bearer $TOKEN" $SERVICE_URL/rates

Result: HTTP 401 Unauthorized.

Why: Cloud Run service-to-service authentication requires an OIDC Identity Token, not an OAuth Access Token [^2]. Access tokens authorize actions on resources (e.g., “Delete this VM”), whereas Identity tokens prove who you are to a service.

Attempt 2: The Generic ID Token

You correct your mistake and generate a valid OIDC Identity Token.

TOKEN=$(gcloud auth print-identity-token)
curl -H "Authorization: Bearer $TOKEN" $SERVICE_URL/rates

Result: HTTP 403 Forbidden.

The “Confused Deputy” moment: You have a valid ID, signed by Google, but Cloud Run rejects you.
Why: By default, gcloud auth print-identity-token generates a token with a generic audience. Cloud Run inspects the Audience (aud) claim, sees it does not match the URL of the banking service, and blocks the request to prevent token replay attacks [^3].

Step 3: The “Blue Team” Fix (Audience Validation)

To successfully connect your GenAI Agent to this backend, you must perform Targeted Token Minting.

We use the --audiences flag to cryptographically stamp the destination URL into the token signature. This proves Intent.

Architecture Diagram: The Secure Handshake

In the secure flow, the token carries the destination address inside it. The Gatekeeper validates this before allowing entry.

+-----------------------------------------------------------------------+
|  Google Cloud Project (Trusted Flow)                                  |
|                                                                       |
|  +--------------+             +------------------------------------+  |
|  |   AI Agent   |             |         Cloud Run Service          |  |
|  | (Service Acct)|            |       (Banking Vault API)          |  |
|  |              |             |                                    |  |
|  | 1. Mint Token|             |   +----------------------------+   |  |
|  |    (Targeted)|------------>|   | IAM Proxy (The Gatekeeper) |   |  |
|  |              |   200 OK    |   | Check: aud == service_url? |   |  |
|  +--------------+             |   |      Result: MATCH         |   |  |
|                               |   +-------------+--------------+   |  |
|                               |                 |                  |  |
|                               |          +------+-------+          |  |
|                               |          |  Flask App   |          |  |
|                               |          +--------------+          |  |
+-----------------------------------------------------------------------+

The Execution

Generate the Targeted Token:
We explicitly state where this token will be used.
```
TARGET_TOKEN=$(gcloud auth print-identity-token --audiences=$SERVICE_URL)
```

Access the Vault:

curl -H "Authorization: Bearer $TARGET_TOKEN" $SERVICE_URL/rates

Result: HTTP 200 OK . You receive: {"rate": 5.25}.

Forensics (Why it worked):
Send the token to our debug endpoint.
```
curl -s -H "Authorization: Bearer $TARGET_TOKEN" $SERVICE_URL/debug
```
Output:
```
{
  "audience_claim": "https://banking-vault-xyz.a.run.app", 
  "token_type": "JWT"
}
```
Because the audience_claim inside the token exactly matches the URL of the service receiving it, the Cloud Run IAM Proxy allows the request to pass.

Conclusion

When building Hybrid AI Agents using Vertex AI Playbooks or Dialogflow CX, security is paramount. You are effectively giving an LLM access to your internal API surface.

Zero Trust: Never expose backend APIs to the public internet (--no-allow-unauthenticated).
Identity: Always use Service Accounts for agent execution.
Intent: Ensure your Agent is minting OIDC tokens with the specific Audience of the target tool.

By enforcing Audience Validation, you ensure that even if a token is leaked from a lower-security development environment, it cannot be used to “confuse the deputy” and access your production banking data.

References:
[^1]: Cloud Run Authentication Overview
[^2]: Google Cloud Authentication: Token Types
[^3]: Service-to-Service Authentication (Audience Validation)

Topic		Replies	Views
Defeat the "Confused Deputy" on Cloud Run Community Articles googler-article , cloud-run	5	209	February 25, 2026
Persistent Cloud Run 401 Unauthorized Despite run.invoker Role and Extensive Troubleshooting Serverless Applications cloud-run	5	435	August 20, 2025
Service to service authentication through GCLB Serverless Applications cloud-run	2	125	February 9, 2022