GKE Pod Network Issue - 1.33.4-gke.1134000

Marivishnu · September 25, 2025, 8:54am

In our GKE environment, we’re running pods as KEDA jobs. Since September 18, 2025, we’ve noticed that some pods are unable to make any outgoing requests.

But we could be able to login to the pod and execute the commands.

GKE Version (Autopilot Cluster): 1.33.4-gke.1134000

This issue doesn’t affect all pods; so far, we’ve only observed it in one specific namespace, and we haven’t been able to reproduce it consistently. The problem seems to occur with just one pod per day within that namespace.

Initially, we suspected a DNS issue, but even when trying to ping external services like google or the IP address of another pod, the requests are failing.

Here’s what we’ve observed:

Ping google:

/ # ping -c 4 google.com
ping: bad address 'google.com'

Ping 8.8.8.8:

/ # ping -c 4 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
--- 8.8.8.8 ping statistics ---
4 packets transmitted, 0 packets received, 100% packet loss

Routing Table:

/ # route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.56.3.1       0.0.0.0         UG    0      0        0 eth0
10.56.3.0       10.56.3.1       255.255.255.192 UG    0      0        0 eth0
10.56.3.1       0.0.0.0         255.255.255.255 UH    0      0        0 eth0

Ping Gateway (10.56.3.1):

/ # ping -c 4 10.56.3.1
PING 10.56.3.1 (10.56.3.1): 56 data bytes
--- 10.56.3.1 ping statistics ---
4 packets transmitted, 0 packets received, 100% packet loss

DNS Configuration:

/ # cat /etc/resolv.conf
search NAMESPACE.svc.cluster.local svc.cluster.local cluster.local c.CLUSTER_NAME.internal google.internal
nameserver NAMESPACE_SERVER_ADDRESS
options ndots:5

Network Configuration:

/ # arp -a
? (10.56.3.1) at <incomplete>  on eth0
/ # ^C

/ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0@if29: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 8896 qdisc noqueue state UP qlen 1000
    link/ether b6:c2:de:40:00:09 brd ff:ff:ff:ff:ff:ff
    inet 10.56.3.24/26 brd 10.56.3.63 scope global eth0
       valid_lft forever preferred_lft forever

/ # ifconfig
eth0      Link encap:Ethernet  HWaddr B6:C2:DE:40:00:09
          inet addr:10.56.3.24  Bcast:10.56.3.63  Mask:255.255.255.192
          UP BROADCAST RUNNING MULTICAST  MTU:8896  Metric:1
          RX packets:5 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10917 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:446 (446.0 B)  TX bytes:458778 (448.0 KiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:7376 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7376 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:967978 (945.2 KiB)  TX bytes:967978 (945.2 KiB)

This leads us to believe the issue could involve network connectivity or routing for this specific pod.

We also noticed that this GKE version was released on Sept 18, 2025 and we’re facing this issue from Sept 19 and we’ve been running these jobs for past 6 months and didn’t notice this kind of issue. GKE release notes (Regular channel) | Google Kubernetes Engine (GKE) | Google Cloud Documentation

Topic		Replies	Views
All of our GKE (autopilot) clusters have stopped being able to reach Firebase auth Serverless Applications gke-enterprise , gke	3	69	April 9, 2025
DNS response routing failure in GKE multi-network cluster Serverless Applications gke-enterprise	1	8	September 12, 2024
DNS Resolution Issue for Pod in GKE Autopilot Cluster with Cloud DNS Serverless Applications gke	3	69	August 7, 2024