DQN showing loss improvements but lacking reward improvements
I am currently working on a project for uni, for which I am applying DQN to solve a warehouse storage allocation problem. I finished programming the Markov Decision Process and the entire DQN last week and it runs. The loss values show an increase in performance (that is, they are being minimised). However, this increase in performance in loss values is not translated to an increase in performance in the reward values, The reward values keep circling around the same values, without showing any form of improvement.