Airflow execution of Torch job resulting in CUDA error – ‘forked subprocess’
I have an internal package that runs scoring using nvidia CUDA GPU acceleration across all available GPUs in my environment leveraging transformers, torch, and accelerate packages. The package runs without any errors when I run in a session (e.g. pycharm manual execution). When it is scheduled with Airflow, I get the following error when the model is loaded into memory: