How to Increase Memory Size Using Custom G2 Instance Types in Google Cloud Batch

Hiroshiba · October 5, 2024, 11:23am

Hello Google Cloud Community,

I am currently working with Google Cloud Batch and I’m interested in increasing the memory size of G2 instance types using custom machine types. According to the documentation on accelerator-optimized machines https://cloud.google.com/compute/docs/accelerator-optimized-machines?hl=ja#g2_limitations , it’s possible to modify the memory size for G2 instances. It’s also mentioned that one can create VM instances with increased memory using gcloud commands.

However, I’m unclear on how to apply this to Google Cloud Batch jobs. Despite specifying the ComputeResource in my job configuration to use a custom type, my jobs continue to launch using the default g2-standard-8 settings. Here is the relevant part of the code I am using:

job {
  task_groups {
    task_spec {
      compute_resource {
        cpu_milli: 4000
        memory_mib: 30517
      }
      max_run_duration {
        seconds: 1209600
      }
      max_retry_count: 1
      runnables {
        container {
          image_uri: "ubuntu"
          commands: "/bin/bash"
          commands: "-c"
          commands: "sleep 3650d"
        }
      }
    }
  }
  allocation_policy {
    instances {
      policy {
        provisioning_model: SPOT
        accelerators {
          type_: "nvidia-l4"
          count: 1
        }
        boot_disk {
          size_gb: 30
        }
      }
      install_gpu_drivers: true
      install_ops_agent: true
    }
  }
  logs_policy {
    destination: CLOUD_LOGGING
  }
}

Could someone advise how to properly configure a Google Cloud Batch job to utilize a custom G2 machine type with increased memory?

Thank you!

francislouie · October 10, 2024, 1:10pm

Hi @Hiroshiba,

Welcome to Google Cloud Community!

Upon checking on the configuration, it seems you were using the GPUs for N1 VMs guideline. The GPU_TYPE and GPU_COUNT fields are only used for GPUs for N1 VMs. Rather, you should use GPUs for accelerator-optimized VMs syntax for your guideline.

This example is equivalent to REST API syntax that can deploy a custom G2 machine type.

{  
  "taskGroups": [
    {
      "taskCount": "1",
      "parallelism": "1",
      "taskSpec": {
        "computeResource": {
          "cpuMilli": "4000",
          "memoryMib": "20480"
        },
        "runnables": [
          {
            "container": {
              "imageUri": 
"gcr.io/xxx-xxx-xxx/test-batch@sha256:xxxxxxxxxxxxxxxxxxxxx",
              "entrypoint": "",
              "volumes": []
            }
          }
        ],
        "volumes": []
      }
    }
  ],
  "allocationPolicy": {
    "instances": [
      {
        "installGpuDrivers": true,
        "policy": {
          "provisioningModel": "SPOT",
          "machineType": "g2-custom-4-20480"
        }
      }
    ]
  },
  "logsPolicy": {
    "destination": "CLOUD_LOGGING"
  }
}

To explain these syntax parameters,

In runnables parameters, I just created a container that run a script to log info in cloud logging
*In computeResource *field,
cpuMilli : 4000
- 1 vCpu is equal to 1000
memoryMib: 20480
- 1024Mb x 20GB memory of the resource- In allocationPolicy parameters,
installGpuDrivers": true,
- If the value is true, the recommended GPU driver will automatically install it . While if the value is set to false, you will manually install the GPU driver. By default the value is false.
provisioningModel: SPOT
- G2 limitation; does not support live migration. If this value is set to STANDARD you will get a Batch code error.
machineType: g2-custom-4-20480
- For the syntax g2-custom-(number of cores)-(memoryMiB).
  Example: g2-custom-4-16384(16GB memory x 1024 MiB)
The rest of the configuration is default.

Sharing my test jobs that run successfully using the above configuration.

For additional references that can help you to understand more about the Batch and using REST API for Compute engine and Cloud Batch, you can relay to these documentations:

Cloud Batch REST API
Compute Engine REST API
G2 Machine type Limitation
Create and run a job that uses GPUs
GPU machine types
API Explorer - this tool will help you to try Google API methods without writing code.

I hope the above information is helpful.

Topic		Replies	Views
create custom N1 instance with T4 GPU through Google Batch Compute Infrastructure accelerators , compute-engine , batch	2	59	February 12, 2025
GCP batch jobs error: CODE_MACHINE_TYPE_NOT_FOUND Compute Infrastructure compute-engine , batch	5	26	October 20, 2023
GCP Batch compute resource bug Compute Infrastructure compute-engine , high-performance-computing-hpc , batch	3	14	August 8, 2024

How to Increase Memory Size Using Custom G2 Instance Types in Google Cloud Batch

AI Suggested topics