Hi, I am trying to set up a AI backend using Autopilot GPU.
When I set the GPU to nvidia-tesla-t4, the pod ends up in infinite pending state. However, when I set GPU to nvidia-l4, it ends up alerting “scale.up.error.quota.exceeded”. I have quotes where
How should I properly set my Kubernetes up?
Can you share your Pod manifest?
here is my service & deployment & pv-pvc yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: ckpt-pv
spec:
# balanced persistent disk
storageClassName: "standard-rwo"
capacity:
storage: 100G
accessModes:
- ReadOnlyMany
claimRef:
namespace: default
name: ckpt-pvclaim
csi:
driver: pd.csi.storage.gke.io
# https://cloud.google.com/compute/docs/gpus/create-gpu-vm-accelerator-optimized#limitations
# regional disk is not supported for nvidia-l4 gpu (g2 vm type)
volumeHandle: projects/passionboost/zones/us-central1-a/disks/ckpt-zonal
fsType: ext4
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
namespace: default
name: ckpt-pvclaim
spec:
storageClassName: "standard-rwo"
volumeName: ckpt-pv
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100G
---
kind: Service
apiVersion: v1
metadata:
name: torchserve
labels:
app: torchserve
annotations:
prometheus.io/scrape: "true"
prometheus.io/path: /metrics
prometheus.io/port: '8082'
spec:
ports:
- name: preds
port: 8080
targetPort: ts
- name: mdl
port: 8081
targetPort: ts-management
- name: metrics
port: 8082
targetPort: ts-metrics
- name: grpc
port: 7070
targetPort: ts-grpc
selector:
app: torchserve
---
kind: Deployment
apiVersion: apps/v1
metadata:
name: torchserve
labels:
app: torchserve
spec:
replicas: 1
selector:
matchLabels:
app: torchserve
template:
metadata:
labels:
app: torchserve
spec:
nodeSelector:
cloud.google.com/gke-accelerator: nvidia-tesla-t4
# cloud.google.com/gke-accelerator: nvidia-l4
cloud.google.com/gke-accelerator-count: "1"
# https://cloud.google.com/kubernetes-engine/docs/how-to/gke-zonal-topology#nodeselector-placement
topology.kubernetes.io/zone: "us-central1-a"
volumes:
- name: ckpt-volume
persistentVolumeClaim:
claimName: ckpt-pvclaim
readOnly: true
containers:
- name: torchserve
image: us-central1-docker.pkg.dev/passionboost/autopilot:testing
command: ["torchserve", "--start", "--models=no-model.mar", "--model-store", "/home/model-server/model-store/", "--ts-config=config.properties"]
ports:
- name: ts
containerPort: 8080
- name: ts-management
containerPort: 8081
- name: ts-metrics
containerPort: 8082
- name: ts-grpc
containerPort: 7070
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /home/model-server/ckpt
name: ckpt-volume
resources:
limits:
cpu: 4
memory: 20Gi
nvidia.com/gpu: 1
When I set the GPU to “nvidia-tesla-t4”, I got the following autopilot logs infinitely and the pods is in PENDING state forever
“scale.up.error.out.of.resources”
Maybe you do not have enough storage quota in the T4 case?
For the L4 case, when you go to the quotas page, does it show any L4’s in use?
In T4 case,
do you mean the disks by “storage”? I don’t think so cause I didn’t receive any quota related error messages and I did formatted and mounted the target persistent disk before.
In L4 case,
Have to double check on that, however, I am pretty sure I wasn’t using any L4 GPU elsewhere
+1, I just use the example manifest from the doc:
apiVersion: v1
kind: Pod
metadata:
name: my-gpu-pod
spec:
nodeSelector:
cloud.google.com/gke-accelerator: nvidia-tesla-t4
cloud.google.com/gke-accelerator-count: "1"
containers:
- name: my-gpu-container
image: nvidia/cuda:11.0.3-runtime-ubuntu20.04
command: ["/bin/bash", "-c", "--"]
args: ["while true; do sleep 600; done;"]
resources:
limits:
nvidia.com/gpu: 1
It’s always in “pending” state, and this is what I got from events
8m34s Warning FailedScaleUp pod/my-gpu-pod Node scale up in zones us-west1-a associated with this pod failed: GCE out of resources. Pod is at risk of not being scheduled.
I can confirm I have the quota
1 Like
GCE out of resources usually means that there’s no hardware available in the region. You’ll have to wait until enough GPUs become free or switch to a different region/try different GPUs.
https://cloud.google.com/kubernetes-engine/docs/troubleshooting/autopilot-clusters#scaleup-failed-out-of-resources