I want to run a job that runs 10,000 tasks in the form of “/path/to/python/interpreter /path/to/python/file arg” in parallel where arg is an arbitrary string that is different for each task. I am wondering what would be the easiest way to do that? I know that different task is assigned different task_id environment variable, but that seems pretty restricted.
@Wen_gcp @wenyhu @bolianyin @Shamel Dear staff members, could you help me when you see this?
Hi @gradientopt ,
If the arg only has limited options for tasks, you can try to set Environment for different arg options: https://cloud.google.com/batch/docs/reference/rest/v1/projects.locations.jobs#Environment.
If the arg is different for each task and you have large number of tasks, can you try to create a file with the string you want to use for each task, and let each task use the BATCH_TASK_INDEX to read the line number from that big file?
You can refer to the taskEnvironment field in TaskGroup: https://cloud.google.com/batch/docs/reference/rest/v1/projects.locations.jobs#taskgroup.
Ref: https://cloud.google.com/batch/docs/create-run-basic-job#create-job-environment-variables.
Hope this helps!
Got it, thanks a lot!