RLLib – testing my trained agent gives bad results
I’m using Ray (version 1.8.0) to train an agent. The agent controls a unit in a simulation, and the simulation can end in one of three different ways: “UnitADestroyed”, “UnitBDestroyed” or “Timeout”. My aim is to maximize the probability of the outcome “UnitADestroyed”, so I give rewards accordingly. I also log the outcome of each simulation.
RLLib – testing my trained agent gives bad results
I’m using Ray (version 1.8.0) to train an agent. The agent controls a unit in a simulation, and the simulation can end in one of three different ways: “UnitADestroyed”, “UnitBDestroyed” or “Timeout”. My aim is to maximize the probability of the outcome “UnitADestroyed”, so I give rewards accordingly. I also log the outcome of each simulation.