Does GCP batch support running python program with different arguments in parallel?

I want to run a job that runs 10,000 tasks in the form of “/path/to/python/interpreter /path/to/python/file arg” in parallel where arg is an arbitrary string that is different for each task. I am wondering what would be the easiest way to do that? I know that different task is assigned different task_id environment variable, but that seems pretty restricted.

1 Like

@Wen_gcp @wenyhu @bolianyin @Shamel Dear staff members, could you help me when you see this?

1 Like

Hi @gradientopt ,

If the arg only has limited options for tasks, you can try to set Environment for different arg options: https://cloud.google.com/batch/docs/reference/rest/v1/projects.locations.jobs#Environment.

If the arg is different for each task and you have large number of tasks, can you try to create a file with the string you want to use for each task, and let each task use the BATCH_TASK_INDEX to read the line number from that big file?

You can refer to the taskEnvironment field in TaskGroup: https://cloud.google.com/batch/docs/reference/rest/v1/projects.locations.jobs#taskgroup.

Ref: https://cloud.google.com/batch/docs/create-run-basic-job#create-job-environment-variables.

Hope this helps!

2 Likes

Got it, thanks a lot!

1 Like