How are per-task resources constrained?

thessjacob · December 17, 2024, 4:51pm

A quick technical question.

When multiple tasks are running on the same instance, how are the resources for each task isolated from one another? If I run an lscpu or a nvidia-smi, for example, the commands return all GPUs on the instance and all CPUs on the instance. Is this simply visual and the batch scheduler enforces boundaries between tasks (maybe using cgroups?), or can some commands break out of the limitations? Any clarity would be helpful.

Best,
Jacob

bolianyin · December 17, 2024, 8:19pm

Only tasks from the same job can run on the same instance. There is currently no isolation or limits via cgroup on tasks. Tasks are free to compete and use all resources available on the instance.

thessjacob · December 17, 2024, 8:53pm

Great to know, thank you!

thessjacob · December 17, 2024, 8:55pm

I guess I should ask–is there any roadmap or feature request to provide that kind of isolation (ala Slurm or other HPC schedulers), or do you see it as unnecessary since it’s possible to simply restrict 1 task per node if the competition is a problem?

bolianyin · December 17, 2024, 9:32pm

Nothing on the roadmap yet and we haven’t seen enough requests to add this feature. There are certain benefits without limits. Some customers want to overcommit cpus, for example, and tasks don’t need all the subscribed cpus at the same time.

thessjacob · December 17, 2024, 10:05pm

Sounds good to me. We can work around it either way–just wanted to know what to plan for. Thanks again!

Topic		Replies	Views
Batch: Managing Instance Count and Tasks per Instance Compute Infrastructure batch	2	22	June 2, 2023
BATCH - How to manage number of instances Compute Infrastructure batch	2	13	February 3, 2023
Hitting OOM issues when increasing compute resources on Batch Compute Infrastructure high-performance-computing-hpc , batch	1	13	September 22, 2023

How are per-task resources constrained?

AI Suggested topics