ViTMAE hidden states from encoder to a Unet decoder for binary semantic segmentation. (Huggingface ViTMAEModel)
I don’t understand how to pass the hidden_states of the ViTMAEModel encoder into the Unet Decoder. I saw a visual on how a version of what I’m trying to do is done and it involves “reshaping”. How do I pass in the hidden states? How do I reshape? Is it like the unpatchify function where I get rid of the cls token? What do I do with the CLS token?