Asymptotic Equipartition Property on Empirical Sequence in Reinforcement Learning

K. Iwata, K. Ikeda, and H. Sakai (Japan)


reinforcement learning, Markov decision process, asymp totic equipartition property, type method, typical set


We discuss an important property called the asymptotic equipartition property on empirical sequences in reinforce ment learning. This states that the typical set of empirical sequences has probability nearly one, that all elements in the typical set are nearly equi-probable, and that the num ber of elements in the typical set is an exponential function of the sum of conditional entropies if the number of time steps is sufficiently large. In addition, the number of ele ments in the typical set is quite small compared to the num ber of possible sequences. This property is very useful for analyzing the reinforcement learning process since most of our attention can be only on the typical set of empirical se quences.

Important Links:

Go Back