Achieving Faster Convergence to the Optimal Policy by using Knowledge of the Unimodal Reward Structure

F. Khawaja, D. Gjoni, M. Huber, D. Cook, and M. Youngblood (USA)


Artificial Intelligence, Intelligent Agents, Machine Learning Blind


Many single-agent as well as multi-agent reinforcement learning algorithms converge to optimal policies in complex intelligent systems without taking into account the mode of the reward structure. In our work, we take into consideration the unimodal reward distribution for an intelligent agent and propose a unimodal learning algorithm that converges to the optimal policy faster than the εGreedy strategy, softmax action selection, and Q-Learning. In addition, we describe an automated intelligent mini-blind system as an application that can utilize our strategy to obtain optimal plant growth faster than the other approaches.

Important Links:

Go Back