Tune integrator using actor critic does not converge
I am trying to use reinforcement learning to tune the integrator gain, in discrete time.
It however does not converge, and it is so sensitive in how I choose the parameters that I wonder if this is correct.