We have gke cluster with k8s version 1.33. In the cluster there was a nodepool with no node associated with it, and other NAP-managed node pools.
There was a notification in console indicating:
Node service account in cluster is missing roles/container.defaultNodeServiceAccount, which results in degraded operations, such as impeded logging, monitoring or performance HPA.
So we added the following role to the default service account that the gke uses using terraform:
resource “google_project_iam_member” “default_sa_container_role” {
project = data.google_client_config.default.project
role = “roles/container.defaultNodeServiceAccount”
member = “serviceAccount:${data.google_project.default.number}-compute@developer.gserviceaccount.com”. A day after adding the container.defaultNodeServiceAccount role to the service account, a new node was up associated with the old node pool that was present in the cluster. After few hours we tried to create another node and failed with internal error. We were not able to create nodepool with NAP, and not manually - we got the same error.
Why adding the role caused the problem? how did the new node was up and running when associated to the “old” node pool, but creating new one failed?
1 Like