Relative Content

Tag Archive for pythonpytorchpytorch-lightningmulti-gpu

why is FSDP not sharding my model between GPUs?

I have a model that is too large even if the batch size is 1 to fit on a single GPU. So I looked into it and it seems that FSDP is the proper way to handle this. Below, I have some code where I manually wrap a few of my layers, and check the amount of GPU memory being used before and after some of the modules where a large portion of memory is being used. From here I saw that only one of the 2 GPUs available were being used, while the other GPU had only 3 bytes of memory allocated on it. This is my model: