How to choose a proper structure (to be more precise: an output layer) for an Optical Character Recognition model?
My question might be silly, but I don’t understand. I have pictures with 8 symbols on each one. Number of classes (length of the alphabet) is equal to 25. I need an LSTM model to predict those symbols. I am going to use CTCLoss, so the input to the loss function should be (Input_length, batch_size, number of classes), i.e. in my case (8, batch_size, 25).