Llama attention weights shape not looking right
I’m running Llama2 70B from huggingface and I’m using the output_attentions=True argument in Transformers to get the attention weights. It’s supposed to output a tuple of (layers, batch size, attention head, input size, input size) but I’m getting a tuple of size (7, 80, 1, 64, input size, input size). What is the extra dimension of 7 supposed to be? Also, I thought causal models were supposed to have lower triangular attention matrices but the output isn’t a lower triangle. Why is this?