How are per-task resources constrained?

A quick technical question.

When multiple tasks are running on the same instance, how are the resources for each task isolated from one another? If I run an lscpu or a nvidia-smi, for example, the commands return all GPUs on the instance and all CPUs on the instance. Is this simply visual and the batch scheduler enforces boundaries between tasks (maybe using cgroups?), or can some commands break out of the limitations? Any clarity would be helpful.

Best,
Jacob

Only tasks from the same job can run on the same instance. There is currently no isolation or limits via cgroup on tasks. Tasks are free to compete and use all resources available on the instance.

1 Like

Great to know, thank you!

I guess I should ask–is there any roadmap or feature request to provide that kind of isolation (ala Slurm or other HPC schedulers), or do you see it as unnecessary since it’s possible to simply restrict 1 task per node if the competition is a problem?

Nothing on the roadmap yet and we haven’t seen enough requests to add this feature. There are certain benefits without limits. Some customers want to overcommit cpus, for example, and tasks don’t need all the subscribed cpus at the same time.

1 Like

Sounds good to me. We can work around it either way–just wanted to know what to plan for. Thanks again!