Computing Near-Optimal Controllers by Learning the Gradient of the Cost-To-Go

Douglas B. Tweed


Optimal control, Learning algorithms, Nonlinear systems, Adaptive control


The method of generalized Hamilton-Jacobi-Bellman equations (GHJB) is a powerful way of creating near-optimal controllers. It is based on the fact that if we have a feedback controller, and we learn to approximate the gradient ∇J of its cost-to-go function, then we can use that gradient to define a better controller. We can then use the new controller’s ∇J to define a still-better controller, and so on. Here we point out that GHJB works indirectly in the sense that it doesn’t learn the best approximation to ∇J but instead learns the time derivative dJ/dt, and infers ∇J from that. We show we can get lower-cost controllers with fewer adjustable parameters by learning ∇J directly. We then compare this direct algorithm with GHJB on test problems from the literature.

Important Links:

Go Back