DATA-EFFICIENT DEEP REINFORCEMENT LEARNING WITH CONVOLUTION-BASED STATE ENCODER NETWORKS

Qiang Fang, Xin Xu, Yixin Lan, Yichuan Zhang, Yujun Zeng, and Tao Tang

References

  1. [1] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction (MIT Press, 2018).
  2. [2] X. Yang, H.B. He, and Q.L. Wei, Reinforcement learning for robust adaptive control of partially unknown nonlinear systems subject to unmatched uncertainties, Information Sciences, 463, 2018, 307–322.
  3. [3] S.S. Chong, L.P. Wong, and C.P. Lim, Automatic design of hyper-heuristic based on reinforcement learning, Information Sciences, 245, 2018, 89–107.
  4. [4] X. Xu, Z.H. Huang, L. Zuo, and H.B. He, Manifold-based reinforcement learning via locally linear reconstruction, IEEE Transactions on Neural Networks and Learning Systems, 28(4), 2017, 934–947.
  5. [5] D.B. Zhao, D.R. Liu, and F.L. Lewis, Special issue on deep reinforcement learning and adaptive dynamic programming, IEEE Transactions on Neural Networks and Learning Systems, 29(6), 2018, 2038–2041.
  6. [6] V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, et al., Human-level control through deep reinforcement learning, Nature, 518(7540), 2015, 529–C533.
  7. [7] G.B. Huang, Q.Y. Zhu, and C.K. Siew, Extreme learning machine: Theory and applications, Neurocomputing, 70(1–3), 2006, 489–501.
  8. [8] Y.F. Wei, F.R. Yu, and M. Song, User scheduling and resource allocation in Hetnets with hybrid energy supply: An actorcritic reinforcement learning approach, IEEE Transactions on Wireless Communications, 17(1), 2018, 680–692.
  9. [9] X. Xu, H.G. He, and D.W. Hu, Efficient reinforcement learning using recursive least-squares methods, Journal of Artificial Intelligence Research, 16, 2002, 259–292.
  10. [10] H.Z. Wang, Y.L. Wu, and G.Y. Min, Data-driven dynamic resource scheduling for network slicing: A deep reinforcement learning approach, Information Sciences, 498, 2019, 106–116.
  11. [11] J. Baxter, P.L. Bartlett, and L. Weaver, Experiments with infinite-horizon, policy-gradient estimation, Journal of Artificial Intelligence Research, 15(1), 2001, 351–381.
  12. [12] J. Ni, X. Li, M. Hua, and S.X. Yang, Bio inspired neural network based q-learning approach for robot path planning in unknown environments, International Journal of Robotics and Automation, 31(6), 2016, 464–474.
  13. [13] T. Yan, W. Zhang, S.X. Yang, and L. Yu, Soft actor-critic reinforcement learning for robotic manipulator with hindsight experience replay, International Journal of Robotics and Automation, 34(5), 2019, 206–216.
  14. [14] J.H. Liu, X. Xu, and Z.H. Huang, Model-free multi-kernel learning control for nonlinear discrete-time systems, International Journal of Robotics and Automation, 32(5), 2017, 401–410.
  15. [15] Z. Chu, D. Zhu, and S.X. Yang, Observer-based adaptive neural network trajectory tracking control for remotely operated vehicle, IEEE Transactions on Neural Networks and Learning Systems, 28(7), 2016, 1633–1645.
  16. [16] Y. Liu, M. Cong, and H. Dong, Reinforcement learning and ega-based trajectory planning for dual robots, International Journal of Robotics and Automation, 33(4), 2018, 140–149.
  17. [17] N.T. Luy, T. Nguyen, and H.M. Tri, Reinforcement learningbased intelligent tracking control for wheeled mobile robot, IEEE Transactions on the Institute of Measurement and Control, 36(7), 2014, 868–877.
  18. [18] G.X. Feng, L. Busoniu, T.M. Guerra, and S. Mohammad, Data-efficient reinforcement learning for energy optimization of power-assisted wheelchairs, IEEE Transactions on Industrial Electronics, 66(12), 2019, 9734–9744.
  19. [19] C. Wu, Y.M. Wang, and Z.J. Yin, Realizing railway cognitive radio: A reinforcement base-station multi-agent model, IEEE Transactions on Intelligent Transportation Systems, 20(4), 2019, 1452–1467.
  20. [20] S. Tetsuya, S. Kyoko, M. Tadanobu, and O. Yoshitaka, Predicting investment behavior: An augmented reinforcement learning model, Neurocomputing, 72(16–18), 2009, 3447–3461.
  21. [21] W.S. Dean and K.D. George, Regularized feature selection in reinforcement learning, Machine Learning, 100(2–3), 2015, 655–676.
  22. [22] Y. Bengio, A. Courville, and P. Vincent, Representation learning: A review and new perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 2013, 1798– 1828.
  23. [23] X. Xu, D.W. Hu, and X.C. Lu, Kernel-based least squares policy iteration for reinforcement learning, IEEE Transactions on Neural Networks, 18(4), 2007, 973–992.
  24. [24] L. Yang, Y. Lu, S.X. Yang, T. Guo, and Z. Liang, A secure clustering protocol with fuzzy trust evaluation and outlier detection for industrial wireless sensor networks, IEEE Transactions on Industrial Informatics, 7(7), 2021, 4837–4847.
  25. [25] Z. Ren, S.X. Yang, Q. Sun, and T. Wang, Consensus affinity graph learning for multiple kernel clustering, IEEE Transactions on Cybernetics, 51(6), 2021, 3273–3284.
  26. [26] H.L. Li, D.R. Liu, and D. Wang, Manifold regularized reinforcement learning, IEEE Transactions on Neural Networks and Learning Systems, 29(4), 2018, 932–943.
  27. [27] S.P. Zhao, B. Zhang, and C.L. Philip, Joint deep convolutional feature representation for hyperspectral palmprint recognition, Information Sciences, 489, 2019, 167–181.
  28. [28] C. Hodges, M. Bennamoun, and H. Rahmani, Single image dehazing using deep neural networks, Pattern Recognition Letterss, 128(9), 2019, 70–77.
  29. [29] J. Gan, W.Q. Wang, and K. Lu, A new perspective: Recognizing online handwritten Chinese characters via 1-dimensional CNN, Information Sciences, 478, 2019, 375–390.
  30. [30] Y.Y. Chen, J.Q. Wang, B.K. Zhu, M. Tang, and H.Q. Lu, Pixelwise deep sequence learning for moving object detection, IEEE Transactions on Circuits and Systems for Video Technology, 29(9), 2019, 2568–2579.
  31. [31] B.S. Dunja, Z. Marusic, and S.Gotovac, Deep learning approach in aerial imagery for supporting land search and rescue missions, International Journal of Computer Vision, 127(9), 2019, 1256– 1278.
  32. [32] X.D. Li, M. Ye, Y.G. Liu, and C. Zhu, Adaptive deep convolutional neural networks for scene-specific object detection, IEEE Transactions on Circuits and Systems for Video Technology, 29(9), 2019, 2538–2551.
  33. [33] U. Raghavendra, H. Fujita, S.V. Bhandary, A. Gudigar, and U.R Acharya, Deep convolution neural network for accurate diagnosis of glaucoma using digital fundus images, Information Sciences, 441, 2018, 41–49.
  34. [34] H.V. Van, A. Guez, and D. Silver, Deep reinforcement learning with double q-learning, in AAAI, vol. 2 (2016), 2094–2100.
  35. [35] G. Lever N. Heess, D. Silver, and M. Riedmiller, Deterministic policy gradient algorithms (2014).
  36. [36] S. Levine, P. Abbeel, M. I. Jordan, J. Schulman, and P. Moritz, Trust region policy optimization (2015), 1889–1897.
  37. [37] F. Wolski, P. Dhariwal, A. Radford, J. Schulman, and O. Klimov, Proximal policy optimization algorithms. arXiv preprint, page arXiv (2017).
  38. [38] T. Hester and P. Stone, Intrinsically motivated model learning for developing curious robots, Artificial Intelligence, 247, 2017, 170–186.
  39. [39] J. Oh, S. Singh, and H. Lee, Value prediction network, in Advances in Neural Information Processing Systems (2017), 6118–6128.
  40. [40] M. Babaeizadeh, L. Frosio, S. Tyree, J. Clemons, and J. Kautz. Reinforcement learning through asynchronous advantage actorcritic on a GPU. arXiv (2016).
  41. [41] L. Tai and M. Liu, Mobile robots exploration through CNNbased reinforcement learning, Robotics and Biomimetics, 3(1), 2016, 1–8.
  42. [42] L. Szoke, S. Aradi, T. Becsi, and P. Gaspar, Driving on highway by using reinforcement learning with CNN and LSTM networks, in 2020 IEEE 24th International Conference on Intelligent Engineering Systems (INES) (2020).
  43. [43] H. Bae, G. Kim, and J. Kim, Multi-robot path planning method using reinforcement learning, Applied Sciences, 9(15), 2019, 2076–2090.
  44. [44] K.J. Astrom and K. Furuta, Swinging up a pendulum by energy control, Automatica, 36(2), 2000, 287–295.

Important Links:

Go Back