I have a created an HTTPS ‘gce-internal’ ingress that routes traffic coming into my Anthos cluster. My ingress was working as expected till today but all of a sudden it started throwing the following error - Error syncing to GCP: error running backend syncing routine: googleapi: Error 404: The resource ‘projects/xxxxxx/zones/europe-west2-a/networkEndpointGroups/xxxxxx’ was not found. Most strange part is that this error is coming up for the API which was already working fine under the same ingress. I tried different approaches to fix the issue but no luck so far. Following is the manifest for the ingress file -
Based from the error that you’re getting, there seems to be a mismatch in version. I have no access to your project so it will be helpful if you will be adding logs to your question. This is also to check what were the recent changes to your configuration prior to the error, “Error syncing to GCP: error running backend syncing routine:”
For now, what I can suggest is to delete the existing ingress, and create a new one.
I started seeing this error yesterday on a regular GKE cluster after doing a rollout of a new container image for a StatefulSet. This cluster has been running in production for about 2 years and it’s the first time I see this error.
Here’s what I’ve observed:
When I looked at the Ingress events, I saw the error message: Error syncing to GCP: error running backend syncing routine: googleapi: Error 404: The resource ‘projects/xxxxx/zones/europe-west1-b/networkEndpointGroups/xxxxx’ was not found, notFound.
I confirmed in GCP console that the NEG was never created.
I confirmed that a ServiceNetworkEndpointGroup resource was created but its lastSyncTime field was always null.
When I looked at the Service resource events, I saw this error message: error processing service “xxxxx/xxxxx”: NEG syncer for xxxxx/xxxxxx/80-8000-GCE_VM_IP_PORT-L7 is shutting down.
I tried to delete the Ingress as suggested, but that didn’t help. I even tried deleting the Service, the StatefulSet and the Ingress, waited until all the Load Balancer resources were not visible anymore on GCP console, recreated the resources and still got the exact same error. Strangely it’s only this Service that’s having this error; all other services work perfectly.
What’s even stranger is that I was able to finally workaround the issue by creating a Service with a different name - this new Service is exactly the same as the old service except for the metadata.name field. If I try to create the Service with the old name, the problem happens again.
We experienced a similiar issue. Turns out we had added a typo to the network endpoint group annotation in our service manifest. When we recreated the service it didn’t recreate the NEG.