Alternative to Receptive field in Transformers and what factors impact it
I want to transition from Computer Vision into NLP. Recently I slowly began to explore what it’s all about. Most of the topics seems to make sense to me, however the one that I’m struggling to comprehend is how receptive field works in NLP models, particularly in transformers.