Confirmation on volatility of ViT loss when training using a Masked AutoEncoder
Hi I am training ViT (patch 16, 224, ImageNet Pretrained) backbone on satellite imagery (Million-AID dataset, ~900000 images of varying sizes) in Self-Supervised learning fashion using an Masked Auto Encoder (MAE). The loss I get during training is very volatile and wanted to confirm if anyone knows if this behavior is normal when training a ViT or when training ViT using MAE approach?