SB3 for imitation learning. How to force demonstration action at given state?
I am trying to train a RL agent using SB3 (PPO algorithm), Gymnasium, and PyTorch.
As the dynamics of the environment is quite complex, I have a dataset of about 200 trajectories that I can use as demonstrations. My idea is to use those at training time so that, every n episodes, a demonstration is injected. When that happens, I force the reset method of the environment to sample from the dataset. However, I have difficulties in forcing the expert action I know should be executed.