Recurrent NN layers intialization problem
I am having problems initializing the LSTM layers for a PPO+LSTM in RLlib.
The inputs expected are different from what I give, and I do not understand why. Here my code:
PPO with LSTM shape mismatch hidden states
I am trying to write a Custom Policy function to be integrated with RLlib, but I persistently get an error on shape mismatch.
I have an environment with an obs space of size 12 and action space of size 2, both continuous.
I want to include in the LSTM also the last action and last reward. Since my task is a navigation task with varying episodes length, I should also mask or pad my sequences. For the moment I did not implement neither the masking or padding.
Here my code snippet and the error I get in the implementation: