Relative Content

Tag Archive for pytorchpytorch-lightninglightning

How to run pytorch lightning with multiple GPUs, with Apptainer and SLURM?

When using 2 GPUs on a single node, or multiple nodes on multiple nodes the training does not start while the job keeps running. I use a container (Apptainer) to deploy the environment and then submit the script to SLURM. The job starts but then stalls. I also tried strategy='deepspeed'.

How to run pytorch lightning with multiple GPUS?

Script freezes when pytorch lightning’s Trainer is instantiated

I’m trying to train a model using pytorch lightning in a cluster with Ubuntu 20.04. However, the code freezes once when the lightning.Trainer is instantiated. There are no error messages, it just freezes, the program does not stop.

Thiết kế website giá rẻ

Danh mục

Relative Content

Tag Archive for pytorchpytorch-lightninglightning

How to run pytorch lightning with multiple GPUs, with Apptainer and SLURM?

How to run pytorch lightning with multiple GPUS?

Script freezes when pytorch lightning’s Trainer is instantiated