DATA-EFFICIENT DEEP REINFORCEMENT LEARNING WITH CONVOLUTION-BASED STATE ENCODER NETWORKS, 1-10.

doi:10.2316/J.2021.206-0763

DATA-EFFICIENT DEEP REINFORCEMENT LEARNING WITH CONVOLUTION-BASED STATE ENCODER NETWORKS, 1-10.

Qiang Fang, Xin Xu, Yixin Lan, Yichuan Zhang, Yujun Zeng, and Tao Tang

References

[1] R. S. Sutton and A. G. Barto, Reinforcement Learning: AnIntroduction (MIT Press, 2018).
[2] X. Yang, H.B. He, and Q.L. Wei, Reinforcement learning forrobust adaptive control of partially unknown nonlinear systemssubject to unmatched uncertainties, Information Sciences, 463,2018, 307–322.
[3] S.S. Chong, L.P. Wong, and C.P. Lim, Automatic design ofhyper-heuristic based on reinforcement learning, InformationSciences, 245, 2018, 89–107.
[4] X. Xu, Z.H. Huang, L. Zuo, and H.B. He, Manifold-basedreinforcement learning via locally linear reconstruction, IEEETransactions on Neural Networks and Learning Systems, 28(4),2017, 934–947.
[5] D.B. Zhao, D.R. Liu, and F.L. Lewis, Special issue on deepreinforcement learning and adaptive dynamic programming,IEEE Transactions on Neural Networks and Learning Systems,29(6), 2018, 2038–2041.
[6] V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness,M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G.Ostrovski, et al., Human-level control through deep reinforcementlearning, Nature, 518(7540), 2015, 529–C533.
[7] G.B. Huang, Q.Y. Zhu, and C.K. Siew, Extreme learningmachine: Theory and applications, Neurocomputing, 70(1–3),2006, 489–501.
[8] Y.F. Wei, F.R. Yu, and M. Song, User scheduling and resourceallocation in Hetnets with hybrid energy supply: An actorcriticreinforcement learning approach, IEEE Transactions onWireless Communications, 17(1), 2018, 680–692.
[9] X. Xu, H.G. He, and D.W. Hu, Efficient reinforcement learningusing recursive least-squares methods, Journal of ArtificialIntelligence Research, 16, 2002, 259–292.
[10] H.Z. Wang, Y.L. Wu, and G.Y. Min, Data-driven dynamicresource scheduling for network slicing: A deep reinforcementlearning approach, Information Sciences, 498, 2019,106–116.
[11] J. Baxter, P.L. Bartlett, and L. Weaver, Experiments withinfinite-horizon, policy-gradient estimation, Journal of ArtificialIntelligence Research, 15(1), 2001, 351–381.
[12] J. Ni, X. Li, M. Hua, and S.X. Yang, Bio inspired neuralnetwork based q-learning approach for robot path planning inunknown environments, International Journal of Robotics andAutomation, 31(6), 2016, 464–474.
[13] T. Yan, W. Zhang, S.X. Yang, and L. Yu, Soft actor-criticreinforcement learning for robotic manipulator with hindsightexperience replay, International Journal of Robotics and Automation,34(5), 2019, 206–216.
[14] J.H. Liu, X. Xu, and Z.H. Huang, Model-free multi-kernellearning control for nonlinear discrete-time systems, InternationalJournal of Robotics and Automation, 32(5), 2017,401–410.
[15] Z. Chu, D. Zhu, and S.X. Yang, Observer-based adaptive neuralnetwork trajectory tracking control for remotely operatedvehicle, IEEE Transactions on Neural Networks and LearningSystems, 28(7), 2016, 1633–1645.
[16] Y. Liu, M. Cong, and H. Dong, Reinforcement learningand ega-based trajectory planning for dual robots, InternationalJournal of Robotics and Automation, 33(4), 2018,140–149.
[17] N.T. Luy, T. Nguyen, and H.M. Tri, Reinforcement learningbasedintelligent tracking control for wheeled mobile robot,IEEE Transactions on the Institute of Measurement and Control,36(7), 2014, 868–877.
[18] G.X. Feng, L. Busoniu, T.M. Guerra, and S. Mohammad,Data-efficient reinforcement learning for energy optimization ofpower-assisted wheelchairs, IEEE Transactions on IndustrialElectronics, 66(12), 2019, 9734–9744.
[19] C. Wu, Y.M. Wang, and Z.J. Yin, Realizing railway cognitiveradio: A reinforcement base-station multi-agent model, IEEETransactions on Intelligent Transportation Systems, 20(4),2019, 1452–1467.
[20] S. Tetsuya, S. Kyoko, M. Tadanobu, and O. Yoshitaka, Predictinginvestment behavior: An augmented reinforcementlearning model, Neurocomputing, 72(16–18), 2009, 3447–3461.
[21] W.S. Dean and K.D. George, Regularized feature selectionin reinforcement learning, Machine Learning, 100(2–3), 2015,655–676.
[22] Y. Bengio, A. Courville, and P. Vincent, Representation learning:A review and new perspectives, IEEE Transactions onPattern Analysis and Machine Intelligence, 35(8), 2013, 1798–1828.
[23] X. Xu, D.W. Hu, and X.C. Lu, Kernel-based least squarespolicy iteration for reinforcement learning, IEEE Transactionson Neural Networks, 18(4), 2007, 973–992.
[24] L. Yang, Y. Lu, S.X. Yang, T. Guo, and Z. Liang, A secure clusteringprotocol with fuzzy trust evaluation and outlier detectionfor industrial wireless sensor networks, IEEE Transactions onIndustrial Informatics, 7(7), 2021, 4837–4847.
[25] Z. Ren, S.X. Yang, Q. Sun, and T. Wang, Consensus affinitygraph learning for multiple kernel clustering, IEEE Transactionson Cybernetics, 51(6), 2021, 3273–3284.
[26] H.L. Li, D.R. Liu, and D. Wang, Manifold regularized reinforcementlearning, IEEE Transactions on Neural Networksand Learning Systems, 29(4), 2018, 932–943.
[27] S.P. Zhao, B. Zhang, and C.L. Philip, Joint deep convolutionalfeature representation for hyperspectral palmprint recognition,Information Sciences, 489, 2019, 167–181.
[28] C. Hodges, M. Bennamoun, and H. Rahmani, Single imagedehazing using deep neural networks, Pattern RecognitionLetterss, 128(9), 2019, 70–77.
[29] J. Gan, W.Q. Wang, and K. Lu, A new perspective: Recognizingonline handwritten Chinese characters via 1-dimensionalCNN, Information Sciences, 478, 2019, 375–390.
[30] Y.Y. Chen, J.Q. Wang, B.K. Zhu, M. Tang, and H.Q. Lu,Pixelwise deep sequence learning for moving object detection,IEEE Transactions on Circuits and Systems for VideoTechnology, 29(9), 2019, 2568–2579.
[31] B.S. Dunja, Z. Marusic, and S.Gotovac, Deep learning approachin aerial imagery for supporting land search and rescue missions,International Journal of Computer Vision, 127(9), 2019, 1256–1278.
[32] X.D. Li, M. Ye, Y.G. Liu, and C. Zhu, Adaptive deep convolutionalneural networks for scene-specific object detection, IEEETransactions on Circuits and Systems for Video Technology,29(9), 2019, 2538–2551.
[33] U. Raghavendra, H. Fujita, S.V. Bhandary, A. Gudigar, andU.R Acharya, Deep convolution neural network for accuratediagnosis of glaucoma using digital fundus images, InformationSciences, 441, 2018, 41–49.
[34] H.V. Van, A. Guez, and D. Silver, Deep reinforcementlearning with double q-learning, in AAAI, vol. 2 (2016),2094–2100.
[35] G. Lever N. Heess, D. Silver, and M. Riedmiller, Deterministicpolicy gradient algorithms (2014).
[36] S. Levine, P. Abbeel, M. I. Jordan, J. Schulman, and P. Moritz,Trust region policy optimization (2015), 1889–1897.
[37] F. Wolski, P. Dhariwal, A. Radford, J. Schulman, andO. Klimov, Proximal policy optimization algorithms. arXivpreprint, page arXiv (2017).
[38] T. Hester and P. Stone, Intrinsically motivated model learningfor developing curious robots, Artificial Intelligence, 247, 2017,170–186.
[39] J. Oh, S. Singh, and H. Lee, Value prediction network, inAdvances in Neural Information Processing Systems (2017),6118–6128.
[40] M. Babaeizadeh, L. Frosio, S. Tyree, J. Clemons, and J. Kautz.Reinforcement learning through asynchronous advantage actorcriticon a GPU. arXiv (2016).
[41] L. Tai and M. Liu, Mobile robots exploration through CNNbasedreinforcement learning, Robotics and Biomimetics, 3(1),2016, 1–8.
[42] L. Szoke, S. Aradi, T. Becsi, and P. Gaspar, Drivingon highway by using reinforcement learning withCNN and LSTM networks, in 2020 IEEE 24th InternationalConference on Intelligent Engineering Systems (INES)(2020).
[43] H. Bae, G. Kim, and J. Kim, Multi-robot path planningmethod using reinforcement learning, Applied Sciences, 9(15),2019, 2076–2090.
[44] K.J. Astrom and K. Furuta, Swinging up a pendulum byenergy control, Automatica, 36(2), 2000, 287–295.

Important Links:

Abstract
DOI: 10.2316/J.2021.206-0763
From Journal (206) International Journal of Robotics and Automation - 2021

Go Back