Using a custom COS-based image with Google Batch

I’m trying to use a custom image with Google Batch. I created an image but it’s fails to schedule. After a timeout, I see this in the gcloud batch describe output:

statusEvents:
  - description: Job state is set from QUEUED to SCHEDULED for job projects/642504272574/locations/us-central1/jobs/avishai-test-202503111453.
    eventTime: '2025-03-11T21:54:07.741553667Z'
    type: STATUS_CHANGED
  - description: no VM has agent reporting correctly within the time window 1080 seconds.
      VM state for instance avishai-test-20250-bd8fccca-7947-48610-group0-0-01s2 is
      2025/03/11-21:55:30+0000,startup,68,unsupported_cos.
    eventTime: '2025-03-11T22:14:34.495983782Z'
    type: OPERATIONAL_INFO
  - description: Job state is set from SCHEDULED to SCHEDULED_PENDING_FAILED for job
      projects/642504272574/locations/us-central1/jobs/avishai-test-202503111453.
    eventTime: '2025-03-11T22:14:34.515776001Z'
    type: STATUS_CHANGED
  - description: Job state is set from SCHEDULED_PENDING_FAILED to FAILED for job
      projects/642504272574/locations/us-central1/jobs/avishai-test-202503111453.
    eventTime: '2025-03-11T22:15:53.759661794Z'
    type: STATUS_CHANGED

Does the custom image need to be created from a batch-cos-stable-official boot disk image? If so, I’d like some guidance on how to create an instance from that image (those images aren’t available for selection, at least from the console UI).

We some steps documented at https://github.com/GoogleCloudPlatform/batch-samples/tree/main/build-custom-image/build-cos-image

Could you see if that helps?

1 Like

That is super useful. I’ll try it out right now.

Unfortunately it’s not as straight forward as I had hoped. git diff:

diff --git a/build-custom-image/build-cos-image/batch_cos_image.yaml b/build-custom-image/build-cos-image/batch_cos_image.yaml
index 2f8e355..2536817 100644
--- a/build-custom-image/build-cos-image/batch_cos_image.yaml
+++ b/build-custom-image/build-cos-image/batch_cos_image.yaml
@@ -4,26 +4,26 @@ substitutions:
   # --------------------------------------------
   # Image name for your custom built image.
   # Image name needs to be unique in the project.
-  _IMAGE_NAME: "batch-cos-image"
+  _IMAGE_NAME: "foo-v1"
   # Image family for your custom built image.
-  _IMAGE_FAMILY: ""
+  _IMAGE_FAMILY: "foo"
   # Image project id your custom image builds based on.
   # Base image with COS image OS type,
   # such as Google standard COS image from project cos-cloud.
   # or Batch's latest COS image from project batch-custom-image.
-  _SOURCE_IMAGE_PROJECT_ID: "cos-cloud"
+  _SOURCE_IMAGE_PROJECT_ID: "batch-custom-image"
   # Image family that your custom image builds based on.
   # If you build image based on "cos-cloud" project, we recommend image family as "cos-stable".
   # If you build image based on "batch-custom-image" project,
   # we recommend image family as "batch-cos-stable-official"
-  _SOURCE_IMAGE_FAMILY: "cos-stable"
+  _SOURCE_IMAGE_FAMILY: "batch-cos-stable-official"
   # Machine type for you image building VM.
   # Any machine type that supports GPU is acceptable.
-  _MACHINE_TYPE: "n1-standard-1"
+  _MACHINE_TYPE: "n1-standard-16"
   # Disk size for your custom built image.
   _DISK_SIZE: "30"
   # Zone for your image building VM.
-  _ZONE: "us-central1-b"
+  _ZONE: "us-east1-b"
   # GPU Type for your image building VM.
   # Any type inside `gcloud compute accelerator-types list` is acceptable.
   _GPU_TYPE: "nvidia-tesla-t4"
@@ -41,7 +41,7 @@ substitutions:
   # For COS VMs, additional preceding command is required to be run on every VM reboot to configure GPU drivers,
   # detail in https://cloud.google.com/container-optimized-os/docs/how-to/run-gpus?#verify_the_installation.
   # Since the image build process needs VM rebooting, we by default disable the installation.
-  _INSTALL_GPU_PACKAGES: "false"
+  _INSTALL_GPU_PACKAGES: "true"

 steps:
 - name: 'gcr.io/cos-cloud/cos-customizer'
diff --git a/build-custom-image/env.sh b/build-custom-image/env.sh
index f66ac60..72a007a 100644
--- a/build-custom-image/env.sh
+++ b/build-custom-image/env.sh
@@ -2,4 +2,4 @@
 # Common variables for all image building.

 # Google Cloud project ID.
-ProjectID=batch-api-samples
+ProjectID=my-project

Ran into a snag:

Step #0: 2025/03/12 18:40:24 start_image_build.go:124: Using image batch-cos-stable-official-20250306-00-p00 from family batch-cos-stable-official
Finished Step #0
Starting Step #1
Step #1: Already have image (with digest): gcr.io/cos-cloud/cos-customizer
Finished Step #1
Starting Step #2
Step #2: Already have image (with digest): gcr.io/cos-cloud/cos-customizer
Step #2: 2025/03/12 18:40:25 finish_image_build.go:369: googleapi: Error 403: Required 'compute.images.get' permission for 'projects/my-project/global/images/foo-v1', forbidden
Finished Step #2
ERROR
ERROR: build step 2 "gcr.io/cos-cloud/cos-customizer" failed: step exited with non-zero status: 1

To be clear, my own user has the necessary permissions.

I’ve looked into the cos-tools repo (cos/tools - Git at Google). src/cmd/cos_customizer/README.md says something about specifying a service account, maybe I have to create one so that it doesn’t use the default GCE one.

Unfortunately not as straight forward as I had hoped. (I’ve tried posting extra info but it won’t let me post a response)

Hi @AvishaiW ,

Could you help check whether you have permissions as https://github.com/GoogleCloudPlatform/batch-samples/tree/main/build-custom-image#prerequisites describes?

Thanks,

Wenyan