Optimizing Proportional Reward in the Multi Agent RL System

A.A. Zadeh, H.S. Shahhoseini (Iran), and S.K. Afshar (Canada)


Reinforcement learning, Multi-Agent System, Markov Game, Optimization, Simulation.


Reinforcement learning methods which have been applied for Multi Agent Systems (MAS), usually originates form single agent system. So they determine a reward for transition from one state to another state and affect the reward to the Q of all actions equivalently. Assigning one reward to all actions of an state transition, cause favorable action would be confirmed as same as unfavorable actions, so the learning period will be lengthened. In this paper first we review a new method proposed by the authors for Multi Agent System, which assigns a fraction of rewards to action in proportion to the Q of each action and so it is called rewarding proportional to Q or RPQ. There are some parameters affect on the RPQ optimization, which is discussed in this paper. Simulation is used to show how RPQ learning speed and final outcome are changed according to algorithm parameters.

Important Links:

Go Back