Hi @Wen_gcp , thanks for getting back to me on this.
I have put together a minimal reproducible example based on the example job definition in the documentation.
Here is the job definition:
{
"taskGroups": [
{
"taskSpec": {
"runnables": [
{
"script": {
"text": "echo Hello world! This is task ${BATCH_TASK_INDEX}. This job has a total of ${BATCH_TASK_COUNT} tasks."
}
}
],
"computeResource": {
"cpuMilli": 8000,
"memoryMib": 30000,
"bootDiskMib": 100000
},
"maxRetryCount": 2,
"maxRunDuration": "3600s"
},
"parallelism": 1
}
],
"allocationPolicy": {
"instances": [
{
"installGpuDrivers": true,
"policy": {
"machineType": "n1-standard-8",
"provisioningModel": "STANDARD",
"accelerators": [
{
"type": "nvidia-tesla-t4",
"count": 1
}
],
"disks": [
{
"deviceName": "additional_disk",
"newDisk": {
"type": "pd-ssd",
"sizeGb": 3000
}
}
]
}
}
],
"location": {
"allowedLocations": [
"zones/us-central1-b"
]
}
},
"logsPolicy": {
"destination": "CLOUD_LOGGING"
}
}
Using this test batch job to answer your questions:
-
- UUID: minimal-reprod-ssd-822f2967-bf0a-4ccd0
- Region: us-central1
- This is the (partially redacted) output from running
gcloud batch jobs describe minimal-reprod-ssd-allocation-20240401-141753 --location us-central1
allocationPolicy:
instances:
- installGpuDrivers: true
policy:
accelerators:
- count: '1'
type: nvidia-tesla-t4
disks:
- deviceName: additional_disk
newDisk:
sizeGb: '3000'
type: pd-ssd
machineType: n1-standard-8
provisioningModel: STANDARD
labels:
batch-job-id: minimal-reprod-ssd-allocation-20240401-141753
location:
allowedLocations:
- regions/us-central1
- zones/us-central1-b
serviceAccount:
email: REDACTED
createTime: '2024-04-01T21:17:54.225011145Z'
logsPolicy:
destination: CLOUD_LOGGING
name: REDACTED
status:
runDuration: 0s
state: QUEUED
statusEvents:
- description: Job state is set from QUEUED to SCHEDULED for job projects/REDACTED/locations/us-central1/jobs/minimal-reprod-ssd-allocation-20240401-141753.
eventTime: '2024-04-01T21:17:57.766324857Z'
type: STATUS_CHANGED
- description: "VM in Managed Instance Group meets error: Batch Error: code - CODE_GCE_ZONE_RESOURCE_POOL_EXHAUSTED,\
\ description - error count is 7, latest message example: Instance 'minimal-reprod-ssd-822f2967-bf0a-4ccd0-group0-0-plpl'\
\ creation failed: The zone 'projects/REDACTED/zones/us-central1-b'\
\ does not have enough resources available to fulfill the request. Try a different\
\ zone, or try again later."
eventTime: '2024-04-01T21:39:35.121Z'
type: OPERATIONAL_INFO
- description: VMs not functioning within the time window 1080 seconds.
eventTime: '2024-04-01T21:39:59.921550373Z'
type: OPERATIONAL_INFO
- description: Job state is set from SCHEDULED to SCHEDULED_PENDING_QUEUED for job
projects/REDACTED/locations/us-central1/jobs/minimal-reprod-ssd-allocation-20240401-141753.
eventTime: '2024-04-01T21:39:59.938074721Z'
type: STATUS_CHANGED
- description: Job state is set from SCHEDULED_PENDING_QUEUED to QUEUED for job
projects/REDACTED/locations/us-central1/jobs/minimal-reprod-ssd-allocation-20240401-141753.
eventTime: '2024-04-01T21:40:47.164395660Z'
type: STATUS_CHANGED
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 14:50:51.649865745 -0700 PDT m=+214356.414855742.'
eventTime: '2024-04-01T21:45:51.649940305Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 14:55:56.114827842 -0700 PDT m=+215098.907538262.'
eventTime: '2024-04-01T21:50:56.114945351Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 15:01:00.575580174 -0700 PDT m=+215430.389426365.'
eventTime: '2024-04-01T21:56:00.575668296Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 15:06:05.414859704 -0700 PDT m=+215160.110862867.'
eventTime: '2024-04-01T22:01:05.414956904Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 15:11:10.885322528 -0700 PDT m=+215575.643588359.'
eventTime: '2024-04-01T22:06:10.885438378Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 15:16:14.918018459 -0700 PDT m=+216403.603340216.'
eventTime: '2024-04-01T22:11:14.918098459Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 15:21:19.732150066 -0700 PDT m=+216708.237442730.'
eventTime: '2024-04-01T22:16:19.732303156Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 15:26:23.83905514 -0700 PDT m=+217115.264477295.'
eventTime: '2024-04-01T22:21:23.839243250Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 15:31:29.303001932 -0700 PDT m=+216698.739494837.'
eventTime: '2024-04-01T22:26:29.303094510Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 15:36:33.755483807 -0700 PDT m=+44759.027281986.'
eventTime: '2024-04-01T22:31:33.755572287Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 15:41:38.743365183 -0700 PDT m=+217307.279607038.'
eventTime: '2024-04-01T22:36:38.743491068Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 15:46:43.311081982 -0700 PDT m=+217683.695794508.'
eventTime: '2024-04-01T22:41:43.311168602Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 15:51:49.216058823 -0700 PDT m=+218013.981048820.'
eventTime: '2024-04-01T22:46:49.216128103Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 15:56:54.486013647 -0700 PDT m=+218319.251003634.'
eventTime: '2024-04-01T22:51:54.486088217Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 16:02:01.596816769 -0700 PDT m=+219150.282138726.'
eventTime: '2024-04-01T22:57:01.596897559Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 16:07:05.686012498 -0700 PDT m=+218818.176830465.'
eventTime: '2024-04-01T23:02:05.686090058Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 16:12:09.859799653 -0700 PDT m=+219039.598787647.'
eventTime: '2024-04-01T23:07:09.859892183Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 16:17:14.646791875 -0700 PDT m=+219539.411781872.'
eventTime: '2024-04-01T23:12:14.646883345Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 16:22:19.267329037 -0700 PDT m=+219819.652041563.'
eventTime: '2024-04-01T23:17:19.267388247Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 16:27:23.808267934 -0700 PDT m=+220148.573257921.'
eventTime: '2024-04-01T23:22:23.808338374Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 16:32:27.823364324 -0700 PDT m=+220441.290208258.'
eventTime: '2024-04-01T23:27:27.823450156Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 16:37:32.029049816 -0700 PDT m=+221306.354588823.'
eventTime: '2024-04-01T23:32:32.029174329Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 16:42:36.44816026 -0700 PDT m=+220866.187148214.'
eventTime: '2024-04-01T23:37:36.448252111Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 16:47:40.465074124 -0700 PDT m=+221269.901567030.'
eventTime: '2024-04-01T23:42:40.465167520Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 16:52:44.688509722 -0700 PDT m=+222296.113931647.'
eventTime: '2024-04-01T23:47:44.688586832Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 16:57:48.2128431 -0700 PDT m=+49633.484641278.'
eventTime: '2024-04-01T23:52:48.212909480Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 17:02:54.694841039 -0700 PDT m=+222167.185659016.'
eventTime: '2024-04-01T23:57:54.694928349Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 17:08:01.361443832 -0700 PDT m=+222476.057446895.'
eventTime: '2024-04-02T00:03:01.361523221Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 17:13:08.196972282 -0700 PDT m=+222868.769351091.'
eventTime: '2024-04-02T00:08:08.197066712Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 17:18:15.491008349 -0700 PDT m=+223743.701457466.'
eventTime: '2024-04-02T00:13:15.491115359Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 17:23:22.362773795 -0700 PDT m=+224050.573223032.'
eventTime: '2024-04-02T00:18:22.362878735Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 17:28:29.367989063 -0700 PDT m=+223619.106977007.'
eventTime: '2024-04-02T00:23:29.368097353Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 17:33:35.984233329 -0700 PDT m=+224035.071707124.'
eventTime: '2024-04-02T00:28:35.984316179Z'
type: SCHEDULING_INFO
- description: 'Quota checking process decided to delay scheduling for the job minimal-reprod-ssd-822f2967-bf0a-4ccd0
due to inadequate quotas [Quota: SSD_TOTAL_GB, limit: 16000, usage: 15975, wanted:
3115.], next schedule time 2024-04-01 17:38:37.429751088 -0700 PDT m=+224972.256447834.'
eventTime: '2024-04-02T00:33:37.429861913Z'
type: SCHEDULING_INFO
taskGroups:
- name: REDACTED
parallelism: '1'
taskCount: '1'
taskSpec:
computeResource:
bootDiskMib: '100000'
cpuMilli: '8000'
memoryMib: '30000'
maxRetryCount: 2
maxRunDuration: 3600s
runnables:
- script:
text: echo Hello world! This is task ${BATCH_TASK_INDEX}. This job has a total
of ${BATCH_TASK_COUNT} tasks.
uid: minimal-reprod-ssd-822f2967-bf0a-4ccd0
updateTime: '2024-04-02T00:33:37.429861913Z'