Why detach probs in symmetric_kl function (deberta shift)
In the official implementation of sift by Microsoft/Deberta here, they implemented symmetric_kl function as:
In the official implementation of sift by Microsoft/Deberta here, they implemented symmetric_kl function as: