Batch task scheduling inefficiency

Hi, I noticed an in-efficiency in Batch task scheduling. Suppose I have 14 tasks and each machine runs 4 tasks, Batch would only schedule 12 tasks to run in parallel and would only run the remaining 2 tasks if those 12 tasks have completed.

So in total it would take 2per_task_time to finish instead of just 1per_task_time (if it’s able to schedule all 14 tasks to run at the same time).

Can the team help me take a look on how to resolve this? Thanks!

1 Like

What value do you have set for the parallelism parameter in the job configuration file? This is what controls how many tasks can run concurrently. Have you tried increasing the number?

1 Like

My parallelism is set to 600 so I do not think that is relevant. I think the issue is that Batch will try to saturate the machine. If I have 4n+2 tasks and each machine can run 4 tasks, it will always leave out 2 tasks until later. I have encountered several similar situations like this.

1 Like

Can you share your job configuration?

1 Like

If quota/capacity is available, Batch should run all tasks in parallel in this case. If you have recent example, could you send a job UID to gcp-batch-preview@google.com and link to this post for us to investigate?

2 Likes