`masked_fill_` function in Pytorch but a list or a tensor of mask value instead of a value
In Pytorch to mask out padded tokens before calculate attention score. I do this function:
In Pytorch to mask out padded tokens before calculate attention score. I do this function: