Kernel Rewards Regression: An Information Efficient Batch Policy Iteration Approach

D. Schneegaß, S. Udluft, and T. Martinetz (Germany)


Machine Learning, Intelligent Control, Reinforcement Learning, Policy Iteration, Kernel-based Learning, Learn ing Theory


We present the novel Kernel Rewards Regression (KRR) method for Policy Iteration in Reinforcement Learning on continuous state domains. Our method is able to obtain very useful policies observing just a few state action tran sitions. It considers the Reinforcement Learning problem as a regression task for which any appropriate technique may be applied. The use of kernel methods, e.g. the Sup port Vector Machine, enables the user to incorporate dif ferent types of structural prior knowledge about the state space by redefining the inner product. Furthermore KRR is a completely Off-policy method. The observations may be constructed by any sufficiently exploring policy, even the fully random one. We tested the algorithm on three typical Reinforcement Learning benchmarks. Moreover we give a proof for the correctness of our model and an error bound for estimating the Q-functions.

Important Links:

Go Back