Relative Content

Tag Archive for slurm

Slurm: using GPU sharding

I cannot use GPU sharding, even though everything seems to have been configured according to the instructions:

How to get an estimate when a job is going to start accoriding to current schedule?

I want to find out when my jobs are going to start. According to the docs it should be possible with squeue --start, however the start times seem to be N/A until the job starts, and it also is just a date. I would like to get an estimate, according to the current state of the queue, in how many minutes/hours/days is my job going to be executed using SLURM.

SLURM how to get an estimate when a job is going to start accoriding to current schedule?

I want to find out when my jobs are going to start. According to the docs it should be possible with squeue --start, however the start times seem to be N/A until the job starts, and it also is just a date. I would like to get an estimate, according to the current state of the queue, in how many minutes/hours/days is my job going to be executed using SLURM.

SLURM jobs between partitions are not suspended

I have two slurm partitions (lhpc and lgpu) with a shared node (n16-90). I have configured one partition with higher priority. I want that if one job uses the shared node through the lgpu partition and there is already one job in the lhpc partition, the latter is suspended and the former allocates the shared node.

How enroot shares image cache and data in multi-node situations?

Currently, I have multiple GPU nodes and pool them through slurm. Enroot.conf adopts the default configuration. At this time, the image pulled by enroot can only be cached on the same node. When running a task on another node, you need to Re-pulling the image results in a waste of time.