Converting generative transformer model from keras to PyTorch I would like to re-create the following keras model in PyTorch.