How to Increase Memory Size Using Custom G2 Instance Types in Google Cloud Batch

Hello Google Cloud Community,

I am currently working with Google Cloud Batch and I’m interested in increasing the memory size of G2 instance types using custom machine types. According to the documentation on accelerator-optimized machines https://cloud.google.com/compute/docs/accelerator-optimized-machines?hl=ja#g2_limitations , it’s possible to modify the memory size for G2 instances. It’s also mentioned that one can create VM instances with increased memory using gcloud commands.

However, I’m unclear on how to apply this to Google Cloud Batch jobs. Despite specifying the ComputeResource in my job configuration to use a custom type, my jobs continue to launch using the default g2-standard-8 settings. Here is the relevant part of the code I am using:

job {
  task_groups {
    task_spec {
      compute_resource {
        cpu_milli: 4000
        memory_mib: 30517
      }
      max_run_duration {
        seconds: 1209600
      }
      max_retry_count: 1
      runnables {
        container {
          image_uri: "ubuntu"
          commands: "/bin/bash"
          commands: "-c"
          commands: "sleep 3650d"
        }
      }
    }
  }
  allocation_policy {
    instances {
      policy {
        provisioning_model: SPOT
        accelerators {
          type_: "nvidia-l4"
          count: 1
        }
        boot_disk {
          size_gb: 30
        }
      }
      install_gpu_drivers: true
      install_ops_agent: true
    }
  }
  logs_policy {
    destination: CLOUD_LOGGING
  }
}

Could someone advise how to properly configure a Google Cloud Batch job to utilize a custom G2 machine type with increased memory?

Thank you!

Hi @Hiroshiba,

Welcome to Google Cloud Community!

Upon checking on the configuration, it seems you were using the GPUs for N1 VMs guideline. The GPU_TYPE and GPU_COUNT fields are only used for GPUs for N1 VMs. Rather, you should use GPUs for accelerator-optimized VMs syntax for your guideline.

This example is equivalent to REST API syntax that can deploy a custom G2 machine type.

{  
  "taskGroups": [
    {
      "taskCount": "1",
      "parallelism": "1",
      "taskSpec": {
        "computeResource": {
          "cpuMilli": "4000",
          "memoryMib": "20480"
        },
        "runnables": [
          {
            "container": {
              "imageUri": 
"gcr.io/xxx-xxx-xxx/test-batch@sha256:xxxxxxxxxxxxxxxxxxxxx",
              "entrypoint": "",
              "volumes": []
            }
          }
        ],
        "volumes": []
      }
    }
  ],
  "allocationPolicy": {
    "instances": [
      {
        "installGpuDrivers": true,
        "policy": {
          "provisioningModel": "SPOT",
          "machineType": "g2-custom-4-20480"
        }
      }
    ]
  },
  "logsPolicy": {
    "destination": "CLOUD_LOGGING"
  }
}

To explain these syntax parameters,

  • In runnables parameters, I just created a container that run a script to log info in cloud logging
  • *In computeResource *field,
  • cpuMilli : 4000
    • 1 vCpu is equal to 1000
  • memoryMib: 20480
    • 1024Mb x 20GB memory of the resource- In allocationPolicy parameters,
  • installGpuDrivers": true,
  • provisioningModel: SPOT
    • G2 limitation; does not support live migration. If this value is set to STANDARD you will get a Batch code error.
  • machineType: g2-custom-4-20480
  • The rest of the configuration is default.

Sharing my test jobs that run successfully using the above configuration.

For additional references that can help you to understand more about the Batch and using REST API for Compute engine and Cloud Batch, you can relay to these documentations:

I hope the above information is helpful.