ImagePullBackOff using custom image

Hi,

I’ve created a GKE cluster using the terraform module. I’ve given the service account the required permissions and I verified the artifact registry image url but my deployment keeps failing with ImagePullBackOff error.
I used a public GCP image without issue.
What am I missing?
Thanks

1 Like

Which permission(s) did you grant to the node service account? And is Artifact Registry is the same project or in a different project?

Hi,

It’s all in the same project and the SA has artifactory reader and storage wiewer

Could you post some details of your deployment by K8S CLI? (kubectl describe deployment <deployment_name>)

Thanks :slightly_smiling_face:

I solved it by linking SA to Workload Identity.

https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity

Hi,

I continued to try to solve the issue and I created a new node pool for ARM64 workloads. The original image was built for arm arch.
Here you have the deployment description (removed envs):

Name: sdk-virtual-pet
Namespace: sdk-apps
CreationTimestamp: Sun, 03 Sep 2023 18:18:39 +0000
Labels:
Annotations: deployment.kubernetes.io/revision: 3
Selector: app=sdk-virtual-pet-app
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=sdk-virtual-pet-app
Containers:
sdk-virtual-pet:
Image: us-west1-docker.pkg.dev/topia-gcp/sdk-apps/sdk-virtual-pet
Port: 3000/TCP
Host Port: 0/TCP
Environment:

Mounts:
Volumes:
Conditions:
Type Status Reason


Progressing True NewReplicaSetAvailable
Available False MinimumReplicasUnavailable
OldReplicaSets: sdk-virtual-pet-68784fdddc (0/0 replicas created), sdk-virtual-pet-56775fb76 (0/0 replicas created)
NewReplicaSet: sdk-virtual-pet-6b84595958 (1/1 replicas created)
Events:
Type Reason Age From Message


Normal ScalingReplicaSet 56m (x2 over 9h) deployment-controller Scaled up replica set sdk-virtual-pet-56775fb76 to 1 from 0
Normal ScalingReplicaSet 23m (x2 over 57m) deployment-controller Scaled down replica set sdk-virtual-pet-56775fb76 to 0 from 1
Normal ScalingReplicaSet 22m deployment-controller Scaled up replica set sdk-virtual-pet-6b84595958 to 1 from 0

I kept trying to get the gke cluster to deploy an ARM image but it always fails.
I have an ARM64 node pool with the taint :
Taints

NoSchedulekubernetes.io/arch=arm64
I added the nodeselector to tolerate that taint:

nodeSelector:
kubernetes.io/arch: arm64

and I still get this error:

Cannot schedule pods: Preemption is not helpful for scheduling.

reason: {

messageId: “no.scale.up.mig.failing.predicate”
parameters: [

0: “TaintToleration”
1: “node(s) had untolerated taint {kubernetes.io/arch: arm64}”

If I remove the node_selector, I get the error that the node has taints, if I added, I get the error that there are no nodes with the taint.
This is taking me too many days and I can’t get rid of the errors. Can someone help me understand what can be wrong?
Here are the definitions:

apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: “7”
creationTimestamp: “2023-09-03T18:18:39Z”
generation: 22
managedFields:

  • apiVersion: apps/v1
    fieldsType: FieldsV1
    fieldsV1:
    f:spec:
    f:progressDeadlineSeconds: {}
    f:revisionHistoryLimit: {}
    f:selector: {}
    f:strategy:
    f:rollingUpdate:
    .: {}
    f:maxSurge: {}
    f:maxUnavailable: {}
    f:type: {}
    f:template:
    f:metadata:
    f:labels:
    .: {}
    f:app: {}
    f:spec:
    f:automountServiceAccountToken: {}
    f:containers:
    k:{“name”:“sdk-virtual-pet”}:
    .: {}
    f:env:
    .: {}
    k:{“name”:“API_KEY”}:
    .: {}
    f:name: {}
    f:valueFrom:
    .: {}
    f:secretKeyRef: {}
    k:{“name”:“BROWSER”}:
    .: {}
    f:name: {}
    f:valueFrom:
    .: {}
    f:secretKeyRef: {}
    k:{“name”:“IMG_ASSET_ID”}:
    .: {}
    f:name: {}
    f:valueFrom:
    .: {}
    f:secretKeyRef: {}
    k:{“name”:“INSTANCE_DOMAIN”}:
    .: {}
    f:name: {}
    f:valueFrom:
    .: {}
    f:secretKeyRef: {}
    k:{“name”:“INSTANCE_PROTOCOL”}:
    .: {}
    f:name: {}
    f:valueFrom:
    .: {}
    f:secretKeyRef: {}
    k:{“name”:“INTERACTIVE_KEY”}:
    .: {}
    f:name: {}
    f:valueFrom:
    .: {}
    f:secretKeyRef: {}
    k:{“name”:“INTERACTIVE_SECRET”}:
    .: {}
    f:name: {}
    f:valueFrom:
    .: {}
    f:secretKeyRef: {}
    k:{“name”:“NODE_ENV”}:
    .: {}
    f:name: {}
    f:valueFrom:
    .: {}
    f:secretKeyRef: {}
    k:{“name”:“PORT”}:
    .: {}
    f:name: {}
    f:valueFrom:
    .: {}
    f:secretKeyRef: {}
    k:{“name”:“REACT_APP_API_URL”}:
    .: {}
    f:name: {}
    f:valueFrom:
    .: {}
    f:secretKeyRef: {}
    f:image: {}
    f:imagePullPolicy: {}
    f:name: {}
    f:ports:
    .: {}
    k:{“containerPort”:3000,“protocol”:“TCP”}:
    .: {}
    f:containerPort: {}
    f:protocol: {}
    f:resources: {}
    f:terminationMessagePath: {}
    f:terminationMessagePolicy: {}
    f:dnsPolicy: {}
    f:enableServiceLinks: {}
    f:imagePullSecrets:
    .: {}
    k:{“name”:“service-account-secret”}: {}
    f:restartPolicy: {}
    f:schedulerName: {}
    f:securityContext: {}
    f:shareProcessNamespace: {}
    f:terminationGracePeriodSeconds: {}
    manager: HashiCorp
    operation: Update
    time: “2023-09-06T03:03:02Z”
  • apiVersion: apps/v1
    fieldsType: FieldsV1
    fieldsV1:
    f:spec:
    f:replicas: {}
    f:template:
    f:spec:
    f:nodeSelector: {}
    manager: GoogleCloudConsole
    operation: Update
    time: “2023-09-18T06:15:03Z”
  • apiVersion: apps/v1
    fieldsType: FieldsV1
    fieldsV1:
    f:metadata:
    f:annotations:
    .: {}
    f:deployment.kubernetes.io/revision: {}
    f:status:
    f:conditions:
    .: {}
    k:{“type”:“Available”}:
    .: {}
    f:lastTransitionTime: {}
    f:lastUpdateTime: {}
    f:message: {}
    f:reason: {}
    f:status: {}
    f:type: {}
    k:{“type”:“Progressing”}:
    .: {}
    f:lastTransitionTime: {}
    f:lastUpdateTime: {}
    f:message: {}
    f:reason: {}
    f:status: {}
    f:type: {}
    f:observedGeneration: {}
    f:replicas: {}
    f:unavailableReplicas: {}
    f:updatedReplicas: {}
    manager: kube-controller-manager
    operation: Update
    subresource: status
    time: “2023-09-18T06:28:19Z”
    name: sdk-virtual-pet
    namespace: sdk-apps
    resourceVersion: “11816664”
    uid: 981a888d-cbe1-42da-8af9-84961b4aa43b
    spec:
    progressDeadlineSeconds: 600
    replicas: 1
    revisionHistoryLimit: 10
    selector:
    matchLabels:
    app: sdk-virtual-pet-app
    strategy:
    rollingUpdate:
    maxSurge: 25%
    maxUnavailable: 25%
    type: RollingUpdate
    template:
    metadata:
    creationTimestamp: null
    labels:
    app: sdk-virtual-pet-app
    spec:
    automountServiceAccountToken: true
    containers:
  • env:
  • name: API_KEY
    valueFrom:
    secretKeyRef:
    key: API_KEY
    name: sdk-virtual-pet
    optional: false
  • name: BROWSER
    valueFrom:
    secretKeyRef:
    key: BROWSER
    name: sdk-virtual-pet
    optional: false
  • name: IMG_ASSET_ID
    valueFrom:
    secretKeyRef:
    key: IMG_ASSET_ID
    name: sdk-virtual-pet
    optional: false
  • name: INSTANCE_DOMAIN
    valueFrom:
    secretKeyRef:
    key: INSTANCE_DOMAIN
    name: sdk-virtual-pet
    optional: false
  • name: INSTANCE_PROTOCOL
    valueFrom:
    secretKeyRef:
    key: INSTANCE_PROTOCOL
    name: sdk-virtual-pet
    optional: false
  • name: INTERACTIVE_KEY
    valueFrom:
    secretKeyRef:
    key: INTERACTIVE_KEY
    name: sdk-virtual-pet
    optional: false
  • name: INTERACTIVE_SECRET
    valueFrom:
    secretKeyRef:
    key: INTERACTIVE_SECRET
    name: sdk-virtual-pet
    optional: false
  • name: NODE_ENV
    valueFrom:
    secretKeyRef:
    key: NODE_ENV
    name: sdk-virtual-pet
    optional: false
  • name: PORT
    valueFrom:
    secretKeyRef:
    key: PORT
    name: sdk-virtual-pet
    optional: false
  • name: REACT_APP_API_URL
    valueFrom:
    secretKeyRef:
    key: REACT_APP_API_URL
    name: sdk-virtual-pet
    optional: false
    image: us-west1-docker.pkg.dev/topia-gcp/sdk-apps/sdk-virtual-pet
    imagePullPolicy: Always
    name: sdk-virtual-pet
    ports:
  • containerPort: 3000
    protocol: TCP
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    dnsPolicy: ClusterFirst
    enableServiceLinks: true
    imagePullSecrets:
  • name: service-account-secret
    nodeSelector:
    kubernetes.io/arch: arm64
    restartPolicy: Always
    schedulerName: default-scheduler
    securityContext: {}
    shareProcessNamespace: false
    terminationGracePeriodSeconds: 30
    status:
    conditions:
  • lastTransitionTime: “2023-09-18T05:46:39Z”
    lastUpdateTime: “2023-09-18T05:46:39Z”
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: “False”
    type: Available
  • lastTransitionTime: “2023-09-18T06:28:19Z”
    lastUpdateTime: “2023-09-18T06:28:19Z”
    message: ReplicaSet “sdk-virtual-pet-bd9f6567c” is progressing.
    reason: ReplicaSetUpdated
    status: “True”
    type: Progressing
    observedGeneration: 22
    replicas: 2
    unavailableReplicas: 2
    updatedReplicas: 1

kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
gke-topia-gke-arm-node-pool-044e9d2e-lwdk Ready 12d v1.27.3-gke.100 arch=arm,beta.kubernetes.io/arch=arm64,beta.kubernetes.io/instance-type=t2a-standard-2,beta.kubernetes.io/os=linux,cloud.google.com/gke-boot-disk=pd-standard,cloud.google.com/gke-container-runtime=containerd,cloud.google.com/gke-cpu-scaling-level=2,cloud.google.com/gke-logging-variant=DEFAULT,cloud.google.com/gke-max-pods-per-node=110,cloud.google.com/gke-netd-ready=true,cloud.google.com/gke-nodepool=arm-node-pool,cloud.google.com/gke-os-distribution=cos,cloud.google.com/gke-provisioning=standard,cloud.google.com/gke-stack-type=IPV4,cloud.google.com/machine-family=t2a,cloud.google.com/private-node=false,cluster_name=topia-gke,default-node-pool=true,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,iam.gke.io/gke-metadata-server-enabled=true,kubernetes.io/arch=arm64,kubernetes.io/hostname=gke-topia-gke-arm-node-pool-044e9d2e-lwdk,kubernetes.io/os=linux,node.kubernetes.io/instance-type=t2a-standard-2,node_pool=arm-node-pool,topology.gke.io/zone=us-central1-b,topology.kubernetes.io/region=us-central1,topology.kubernetes.io/zone=us-central1-b
gke-topia-gke-arm-node-pool-b1c1fb22-prrs Ready 12d v1.27.3-gke.100 arch=arm,beta.kubernetes.io/arch=arm64,beta.kubernetes.io/instance-type=t2a-standard-2,beta.kubernetes.io/os=linux,cloud.google.com/gke-boot-disk=pd-standard,cloud.google.com/gke-container-runtime=containerd,cloud.google.com/gke-cpu-scaling-level=2,cloud.google.com/gke-logging-variant=DEFAULT,cloud.google.com/gke-max-pods-per-node=110,cloud.google.com/gke-netd-ready=true,cloud.google.com/gke-nodepool=arm-node-pool,cloud.google.com/gke-os-distribution=cos,cloud.google.com/gke-provisioning=standard,cloud.google.com/gke-stack-type=IPV4,cloud.google.com/machine-family=t2a,cloud.google.com/private-node=false,cluster_name=topia-gke,default-node-pool=true,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,iam.gke.io/gke-metadata-server-enabled=true,kubernetes.io/arch=arm64,kubernetes.io/hostname=gke-topia-gke-arm-node-pool-b1c1fb22-prrs,kubernetes.io/os=linux,node.kubernetes.io/instance-type=t2a-standard-2,node_pool=arm-node-pool,topology.gke.io/zone=us-central1-a,topology.kubernetes.io/region=us-central1,topology.kubernetes.io/zone=us-central1-a