I would like to make a semantic segmentation model based on Key-Value Transformer.. In this paper, query has removed from self-attention part.
Therefore, I am not sure if I could still use ViT as my backbone. If not possible, should I make a new backbone or is there any other solution?