Relative Content

Tag Archive for pythonpytorchcudagpullama

Pytorch uses wrong GPU

I am using a laptop with two GPUs. The first one is a Intel(R) UHD Graphics and the second one is a NVIDIA GeForce RTX 4090 Laptop GPU.
I want to run a RAG application using a llama3 model using the second GPU. When I run it, the model is loaded into memory. Afterwards the second GPU is used only until the streaming of the output starts. When streaming only the first GPU is used. This can be seen by running the application and watching the performance with the task manager.

Cuda uses wrong GPU

I am using a laptop with two GPUs. The first one is a Intel(R) UHD Graphics and the second one is a NVIDIA GeForce RTX 4090 Laptop GPU.
I want to run a RAG application using a llama3 model using the second GPU. When I run it, the model is loaded into memory. Afterwards the second GPU is used only until the streaming of the output starts. When streaming only the first GPU is used. This can be seen by running the application and watching the performance with the task manager.