Regular Paper

International Journal of Control, Automation and Systems 2022; 20(9): 3098-3109

Published online August 17, 2022

https://doi.org/10.1007/s12555-021-0473-6

© The International Journal of Control, Automation, and Systems

Twin Deterministic Policy Gradient Adaptive Dynamic Programming for Optimal Control of Affine Nonlinear Discrete-time Systems

Jiahui Xu, Jingcheng Wang*, Jun Rao, Yanjiu Zhong, and Shangwei Zhao

Shanghai Jiao Tong University

Abstract

Recent achievements in the field of adaptive dynamic programming (ADP), as well as the data resources and computational capabilities in modern control systems, have led to a growing interest in learning and data-driven control technologies. This paper proposes a twin deterministic policy gradient adaptive dynamic programming (TDPGADP) algorithm to solve the optimal control problem for a discrete-time affine nonlinear system in a modelfree scenario. To solve the overestimation problem resulted from function approximation errors, the minimum value between the double Q network is taken to update the control policy. The convergence of the proposed algorithm in which the value function is served as the Lyapunov function is verified. By designing a twin actor-critic network structure, combining the target network and a specially designed adaptive experience replay mechanism, the algorithm is convenient to implement and the sample efficiency of the learning process can be improved. Two simulation examples are conducted to verify the efficacy of the proposed method.

Keywords Adaptive dynamic programming, affine nonlinear system, optimal control, twin deterministic policy gradient.

Article

Regular Paper

International Journal of Control, Automation and Systems 2022; 20(9): 3098-3109

Published online September 1, 2022 https://doi.org/10.1007/s12555-021-0473-6

Copyright © The International Journal of Control, Automation, and Systems.

Twin Deterministic Policy Gradient Adaptive Dynamic Programming for Optimal Control of Affine Nonlinear Discrete-time Systems

Jiahui Xu, Jingcheng Wang*, Jun Rao, Yanjiu Zhong, and Shangwei Zhao

Shanghai Jiao Tong University

Abstract

Recent achievements in the field of adaptive dynamic programming (ADP), as well as the data resources and computational capabilities in modern control systems, have led to a growing interest in learning and data-driven control technologies. This paper proposes a twin deterministic policy gradient adaptive dynamic programming (TDPGADP) algorithm to solve the optimal control problem for a discrete-time affine nonlinear system in a modelfree scenario. To solve the overestimation problem resulted from function approximation errors, the minimum value between the double Q network is taken to update the control policy. The convergence of the proposed algorithm in which the value function is served as the Lyapunov function is verified. By designing a twin actor-critic network structure, combining the target network and a specially designed adaptive experience replay mechanism, the algorithm is convenient to implement and the sample efficiency of the learning process can be improved. Two simulation examples are conducted to verify the efficacy of the proposed method.

Keywords: Adaptive dynamic programming, affine nonlinear system, optimal control, twin deterministic policy gradient.

IJCAS
September 2024

Vol. 22, No. 9, pp. 2673~2953

Stats or Metrics

Share this article on

  • line

Related articles in IJCAS

IJCAS

eISSN 2005-4092
pISSN 1598-6446