I am running a POD that pulls a 12G image. The POD keeps throwing ImagePullBackOff
error.
I ssh’ed on the GKE node to which the POD is scheduled. Journalctl on machine node keeps spitting the below message
Dec 20 22:12:27 gke-subsalt-cluster–pool-gpu-nodepoo-687078b8-lp6n containerd[2640]: time=“2023-12-20T22:12:27.342460503Z” level=error msg=“cancel pulling image gcr.io/xxxxx/subsalt-ray:2023-12-20_7c00fb9 because of no progress in 1m0s”
GKE cluster version: 1.27.4-gke.900 and it is a Standard GKE cluster. it is a private cluster
Please let me know what other details might be helpful to debug this.
The other PODs are running fine on the same machine including the PODs in the kube-system namespace.