Hierarchical Reinforcement Learning with Subpolicies Specializing for Learned Subgoals

B. Bakker (Switzerland) and J. Schmidhuber (The Netherlands)


Reinforcement learning, hierarchical reinforcement learn ing, feedforward neural networks, recurrent neural net works, MDPs, POMDPs, short-term memory


This paper describes a method for hierarchical reinforce ment learning in which high-level policies automatically discover subgoals, and low-level policies learn to special ize for different subgoals. Subgoals are represented as de sired abstract observations which cluster raw input data. High-level value functions cover the state space at a coarse level; low-level value functions cover only parts of the state space at a fine-grained level. An experiment shows that this method outperforms several flat reinforcement learn ing methods. A second experiment shows how problems of partial observability due to observation abstraction can be overcome using high-level policies with memory.

Important Links:

Go Back