Relative Content

Tag Archive for huggingface-transformersattention-modelllama

Llama attention weights shape not looking right

I’m running Llama2 70B from huggingface and I’m using the output_attentions=True argument in Transformers to get the attention weights. It’s supposed to output a tuple of (layers, batch size, attention head, input size, input size) but I’m getting a tuple of size (7, 80, 1, 64, input size, input size). What is the extra dimension of 7 supposed to be? Also, I thought causal models were supposed to have lower triangular attention matrices but the output isn’t a lower triangle. Why is this?