International Journal of Control, Automation and Systems 2012; 10(1): 166-172
Published online February 4, 2012
https://doi.org/10.1007/s12555-012-0119-9
© The International Journal of Control, Automation, and Systems
This article demonstrates that Q-learning can be accelerated by appropriately specifying initial Q-values using dynamic wave expansion neural network. In our method, the neural network has the same topography as robot work space. Each neuron corresponds to a certain discrete state. Every neuron of the network will reach an equilibrium state according to the initial environment information. The activity of the special neuron denotes the maximum cumulative reward by following the optimal policy from the corresponding state when the network is stable. Then the initial Q-values are defined as the immediate reward plus the maximum cumulative reward by following the optimal policy beginning at the succeeding state. In this way, we create a mapping between the known environment information and the initial values of Q-table based on neural network. The prior knowledge can be incorporated into the learning system, and give robots a better learning foundation. Results of experiments in a grid world problem show that neural network-based Q-learning enables a robot to acquire an optimal policy with better learning performance compared to conventional Q-learning and potential field-based Q-learning.
Keywords Mobile robot, neural network, Q-values initialization, reinforcement learning.
International Journal of Control, Automation and Systems 2012; 10(1): 166-172
Published online February 1, 2012 https://doi.org/10.1007/s12555-012-0119-9
Copyright © The International Journal of Control, Automation, and Systems.
Yong Song, Yi-bin Li*, Cai-hong Li, and Gui-fang Zhang
Shandong University, China
This article demonstrates that Q-learning can be accelerated by appropriately specifying initial Q-values using dynamic wave expansion neural network. In our method, the neural network has the same topography as robot work space. Each neuron corresponds to a certain discrete state. Every neuron of the network will reach an equilibrium state according to the initial environment information. The activity of the special neuron denotes the maximum cumulative reward by following the optimal policy from the corresponding state when the network is stable. Then the initial Q-values are defined as the immediate reward plus the maximum cumulative reward by following the optimal policy beginning at the succeeding state. In this way, we create a mapping between the known environment information and the initial values of Q-table based on neural network. The prior knowledge can be incorporated into the learning system, and give robots a better learning foundation. Results of experiments in a grid world problem show that neural network-based Q-learning enables a robot to acquire an optimal policy with better learning performance compared to conventional Q-learning and potential field-based Q-learning.
Keywords: Mobile robot, neural network, Q-values initialization, reinforcement learning.
Vol. 22, No. 9, pp. 2673~2953
Shengya Meng, Fanwei Meng*, Wang Yang, and Qi Li
International Journal of Control, Automation, and Systems 2024; 22(1): 163-173Dong-Han Lee, Kyung-Soo Kwak, and Soo-Chul Lim*
International Journal of Control, Automation, and Systems 2023; 21(12): 4032-4040Wojciech Paszkowiak*, Marcin Pelic, and Tomasz Bartkowiak
International Journal of Control, Automation, and Systems 2023; 21(10): 3456-3469