How to mask a multi-head attention layer? I’m trying to make a Transformer model that could recibe sequences of a variable lenght. But I can’t