Turning Lights Out with DQ-Learning

D.V. Batalov and B.J. Oommen (Canada)


machine learning, reinforcement learning, game playing


A tabular Reinforcement Learning algorithm, such as Q learning, is incapable of solving problems with large state spaces due to explosive growth of the table required to be kept in fast memory (e.g. RAM). Neural Networks offer a solution at the expense of representation, among other things. Yet if we analyze the learning task in terms of the goal arity, we can approach problems of larger state spaces with Discretized Q-learning algorithm, by dramatically re ducing memory requirements for the Q table. This paper demonstrates the methodology using the well-known LightsOut puzzle. While analytical methods of solving some variants of this puzzle are known, we pro ceed under a different premise of putting the learning agent in a much more dif´Čücult position of having virtually no ini tial information about the rules of the puzzle and often not being able to perceive the symmetries inherent in the two dimensional puzzle grid. Without the described savings a standard and already thrifty Q-learning solution to a 5x5 variant of the puzzle would require over 3 gigabytes of RAM with 4 bytes per table entry, while our Discretized Q-learning solution re quired only 5 bits per table entry, allowing us to solve the puzzle on a 1 gigabyte machine. Finally, we also consider a variant of this puzzle for which no analytical solutions are known. The Discretized Q-learning algorithm is equally applicable to the new puzzle and can solve it with exactly the same effort.

Important Links:

Go Back