Hello,
In my Logs explorer I can see a lot of errors from anetd pods:
"level":"error", "msg":"Found a foreign IP address with the ID of the current node",
Looks like this issue is connected with Node pool autoscalling- after some node is scaled down it starts to fire. There are millions of rows every hour. It increase costs of logs very much.
I had the same issue just recently. As a consequence, the running node was forcefully replaced by a new node. That’s quite a critical case, because it caused a service interruption.
Example of log:
{
"insertId": "g7ch9dngur3jk9x1",
"jsonPayload": {
"ipAddr": "10.128.15.212",
"nodeName": "<censored>-dd7e86ff-7wk9",
"msg": "Found a foreign IP address with the ID of the current node",
"subsys": "linux-datapath",
"level": "error",
"nodeID": 33090
},
"resource": {
"type": "k8s_container",
"labels": {
"namespace_name": "kube-system",
"project_id": "<censored>-production",
"container_name": "cilium-agent",
"location": "europe-west6",
"pod_name": "anetd-z7kzn",
"cluster_name": "<censored>"
}
},
"timestamp": "2024-11-28T13:22:18.236216656Z",
"severity": "ERROR",
"labels": {
"k8s-pod/controller-revision-hash": "7c6877ffb5",
"compute.googleapis.com/resource_name": "<censored>-fc958c7d-t7rl",
"k8s-pod/k8s-app": "cilium",
"k8s-pod/pod-template-generation": "26"
},
"logName": "projects/<censored>/logs/stderr",
"receiveTimestamp": "2024-11-28T13:22:22.383685693Z"
}
I found same errors in our clusters while chasing down severe networking issues for one of them after 1.31 upgrade. Still not sure if I’m on right track with this error, but would appreciate any additional info on this.
@mykhailo_p it was a once issue. Seems like it’s caused by GKE engine and therefore nothing could be done on the client level with it. At least, this is my assumption. Good thing, it happened only once for me.
The issue originates from Cilium deployed on Dataplane V2 clusters. The Dataplane V2 clusters, beginning with GKE v1.28, may encounter a state that results in the following log message: “Found a foreign IP address with the ID of the current node”. The log message is produced by the cilium-agent container in the anetd Node agent Pod:
resource.type="k8s_container"
resource.labels.namespace_name="kube-system"
resource.labels.container_name="cilium-agent"
jsonPayload.subsys="linux-datapath"
jsonPayload.msg="Found a foreign IP address with the ID of the current node"
The trigger for the issue is when the anetd Node agent misses one or more Node deletion events. If an event is missed, when the podCIDR for the Node is reused, the agent which missed the event will have an inconsistent view of the (CIlium) Node ID. This inconsistency does not affect GKE clusters, and GKE does not enable the features which rely on this subsystem. The issue and the solution is described in more detail in the fix.
Google Cloud Engineering is qualifying the fix to roll it out in future GKE releases. In the meantime, you may drop the logs by adding an exclusion filter.
gcloud CLI
gcloud logging sinks update _Default \
--add-exclusion='name=exclude-cilium-logs,filter=resource.type="k8s_container" AND resource.labels.namespace_name="kube-system" AND resource.labels.container_name="cilium-agent" AND jsonPayload.msg="Found a foreign IP address with the ID of the current node"'
Click Edit sink in the drop-down menu next to _Default log router sink.
On the Edit logs routing sink screen under Choose logs to filter out of sink, configure an exclusion filter that matches cilium-agent container from kube-system namespace:
resource.type="k8s_container" AND
resource.labels.namespace_name="kube-system" AND
resource.labels.container_name="cilium-agent" AND
jsonPayload.msg="Found a foreign IP address with the ID of the current node"