Regular Papers

International Journal of Control, Automation, and Systems 2024; 22(10): 3068-3082

https://doi.org/10.1007/s12555-023-0616-z

© The International Journal of Control, Automation, and Systems

TD3 Algorithm of Dynamic Classification Replay Buffer Based PID Parameter Optimization

Haojun Zhong and Zhenlei Wang*

East China University of Science and Technology

Abstract

In industrial control, PID controllers are widely used, but their control performance depends heavily on the parameters of the controller. However, the adjustment of PID controller parameters is cumbersome and inefficient. Recently, deep reinforcement learning has been gradually introduced into the industrial control field due to its advantage of being able to learn autonomously by interacting with the environment. In this paper, a PID parameter optimization method based on TD3 algorithm of dynamic classification replay buffer (DCRB-TD3) is proposed. By designing the optimization framework, the optimization process of the PID parameters is converted into the learning process of the weights of the actor network. In order to improve the learning efficiency of the reinforcement learning algorithm, avoid the phenomenon of control curve dispersion and ensure that the whole process can be continuously closed-loop optimized. In this paper, the regular TD3 algorithm is improved, a dynamic classification ratio strategy is designed, and a sampling update method for dynamic classification experience replay is proposed. Finally, simulations are performed on various systems, and DCRB-TD3 is compared with the PID parameter optimization method based on the PSO algorithm. The results show that the PID parameters optimized by DCRB-TD3 have better control performance than other methods.

Keywords Deep reinforcement learning, intelligent optimization, PID parameter optimization, twin delayed deep deterministic policy gradient (TD3).

Article

Regular Papers

International Journal of Control, Automation, and Systems 2024; 22(10): 3068-3082

Published online October 1, 2024 https://doi.org/10.1007/s12555-023-0616-z

Copyright © The International Journal of Control, Automation, and Systems.

TD3 Algorithm of Dynamic Classification Replay Buffer Based PID Parameter Optimization

Haojun Zhong and Zhenlei Wang*

East China University of Science and Technology

Abstract

In industrial control, PID controllers are widely used, but their control performance depends heavily on the parameters of the controller. However, the adjustment of PID controller parameters is cumbersome and inefficient. Recently, deep reinforcement learning has been gradually introduced into the industrial control field due to its advantage of being able to learn autonomously by interacting with the environment. In this paper, a PID parameter optimization method based on TD3 algorithm of dynamic classification replay buffer (DCRB-TD3) is proposed. By designing the optimization framework, the optimization process of the PID parameters is converted into the learning process of the weights of the actor network. In order to improve the learning efficiency of the reinforcement learning algorithm, avoid the phenomenon of control curve dispersion and ensure that the whole process can be continuously closed-loop optimized. In this paper, the regular TD3 algorithm is improved, a dynamic classification ratio strategy is designed, and a sampling update method for dynamic classification experience replay is proposed. Finally, simulations are performed on various systems, and DCRB-TD3 is compared with the PID parameter optimization method based on the PSO algorithm. The results show that the PID parameters optimized by DCRB-TD3 have better control performance than other methods.

Keywords: Deep reinforcement learning, intelligent optimization, PID parameter optimization, twin delayed deep deterministic policy gradient (TD3).

IJCAS
October 2024

Vol. 22, No. 10, pp. 2955~3252

Stats or Metrics

Share this article on

  • line

Related articles in IJCAS

IJCAS

eISSN 2005-4092
pISSN 1598-6446