I’m running a private GKE cluster (nodes have no external IPs) and I’ve configured egress proxy VMs (based on Compute Engine instances with --can-ip-forward and simple MASQUERADE via iptables). In my VPC, I defined custom static routes using next-hop-instance and network tags to route all egress traffic from nodes in each zone through the appropriate egress proxy.
I assign the corresponding network tags to GKE nodes using node_config.tags in the Terraform google_container_node_pool resource. I’ve confirmed the tags are applied correctly on each VM (node).
Also, the ip-masq-agent is configured and running with a custom ConfigMap, and the expected nonMasqueradeCIDRs are in place.
Issue:
Despite this, I don’t see any traffic reaching the egress proxies from the pods. The proxies themselves work fine — curl to external resources works when run directly from them. But from the pods, even when reaching public IPs, there’s no sign of traffic at the proxies (verified with tcpdump, iptables LOG rules, and dmesg).
Question:
Does tag-based custom route selection actually work for GKE nodes when using node_config.tags? Or does GKE internally route traffic in a way that bypasses such custom VPC routing?
I’d really appreciate any clarification on whether this is expected behavior — and if so, what the recommended approach is to enforce egress through proxy VMs in a private cluster.
Thank you in advance!
HI @a_aleinikov ,
Welcome to Google Cloud Community!
Kindly verify if the VPC firewall rules allow network traffic from the IP address ranges assigned to GKE nodes and pods to reach the IP addresses of the proxy VMs. Restrictive egress rules may block traffic, even with implied allow egress rules in place.
Google Cloud’s routing order prioritizes subnet routes over custom static routes with the same destination range. If your custom routes’ destination overlaps with a subnet route or default route, the subnet route might take precedence. SSH into a GKE node and run ip route to confirm the custom route is present and has higher priority than the default route.
The route must have the correct network tags matching those applied to GKE nodes and confirm the next-hop instance is set to the proxy VM.
For an alternative approach, you may use Cloud NAT to route traffic from GKE nodes and pods through the proxy VM’s external IP.
And, you may verify if your GKE cluster and egress proxy VM are in different VPCs? If this is confirmed, this could be the primary reason why pod traffic isn’t reaching the proxies. Your tag-based custom routes in the GKE VPC cannot direct traffic to proxy VMs in another VPC unless peering is configured with custom route exchange enabled.
Sharing with you other references below that you may find helpful:
If the issue persists, kindly reach out to Google Cloud Support for further assistance.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
Hi @diannemcm ,
Thank you — you were absolutely right. The issue turned out to be with the routes defined in the host project that provides our environment. Once we adjusted those, the egress traffic started flowing correctly through the proxy VMs.
Appreciate your thorough explanation and guidance!
1 Like