Should nested modules with shared weights be an nn.Module object parameter or not?
I would like two torch.nn.Module classes to share part of their architecture and weights, as in the example below:
Efficient PyTorch band matrix to dense matrix multiplication
Problem: In one of my programs, I need to calculate a matrix multiplication A @ B
where both are of size N by N for considerably large N. I’m conjecturing that approximating this product by using band_matrix(A, width) @ B
could just suffice the needs, where band_matrix(A, width)
denotes a band matrix part of A
with width width
. For example, width = 0
gives the diagonal matrix with diagonal elements taken from A
and width = 1
gives the tridiagonal matrix taken in a similar manner.
Loss is returned as Nan when using padding mask in nn.Transformer
Im using nn.Transformer for seq to seq prediction , but when teaching model on data with padding mask , it returns tensors filled with nans.
Loss is returned as Nan when using padding mask in nn.Transformer
Im using nn.Transformer for seq to seq prediction , but when teaching model on data with padding mask , it returns tensors filled with nans.
Loss is returned as Nan when using padding mask in nn.Transformer
Im using nn.Transformer for seq to seq prediction , but when teaching model on data with padding mask , it returns tensors filled with nans.
Loss is returned as Nan when using padding mask in nn.Transformer
Im using nn.Transformer for seq to seq prediction , but when teaching model on data with padding mask , it returns tensors filled with nans.
Implementing a Zero-Layer Transformer in PyTorch
I’m trying to implement a Zero-Layer Transformer as described in this article or Video. I’ve come up with the following implementation:
PyTorch model learns just data imbalance
I am currently doing some research with machine learning and I am facing some issues using pytorch with opacus. Starting at epoch one, I get an accuracy at roughly 0.61, and this number does not increasing matter how I choose the number of epochs and parameters like batchsize, learning rate, …
The accuracy reflects very much the distribution of my training data (binary classification), so I guess my model simply does not learn from data but from the probability of the target. I am pretty sure I made some mistakes during the implementation which leads me to not learn anything out of the data. In the best case, I want to keep the SGD as optimizer for some project related reasons.
I use the opacus module to get differential privacy. I want to predict the heart disease column based on the other attributes. These data are just dummy data with the encoding I will use later for different sets. I would really appreciate if you could help me 🙂
How to train model in Pytorch with xml files?
I have in folder train folders gest_a and gest_b an in it photos and xml files.
How do I train on multiple gpus?
I have this training code from Spotify Research’s github and want to run it on multiple GPUs. I attached the script and how they run it using torch.distributed.launch but I don’t understand how exactly the distributed part works.