International Journal of Control, Automation, and Systems 2024; 22(8): 2591-2601
https://doi.org/10.1007/s12555-023-0724-9
© The International Journal of Control, Automation, and Systems
In response to the increasingly complex problem of patrolling urban areas, the utilization of deep reinforcement learning algorithms for autonomous unmanned aerial vehicle (UAV) coverage path planning (CPP) has gradually become a research hotspot. CPP’s solution needs to consider several complex factors, including landing area, target area coverage and limited battery capacity. Consequently, based on incomplete environmental information, policy learned by sample inefficient deep reinforcement learning algorithms are prone to getting trapped in local optima. To enhance the quality of experience data, a novel reward is proposed to guide UAVs in efficiently traversing the target area under battery limitations. Subsequently, to improve the sample efficiency of deep reinforcement learning algorithms, this paper introduces a novel dynamic soft update method, incorporates the prioritized experience replay mechanism, and presents an improved deep double Q-network (IDDQN) algorithm. Finally, simulation experiments conducted on two different grid maps demonstrate that IDDQN outperforms DDQN significantly. Our method simultaneously enhances the algorithm’s sample efficiency and safety performance, thereby enabling UAVs to cover a larger number of target areas.
Keywords Battery limitation, coverage path planning, deep reinforcement learning, dynamic soft update.
International Journal of Control, Automation, and Systems 2024; 22(8): 2591-2601
Published online August 1, 2024 https://doi.org/10.1007/s12555-023-0724-9
Copyright © The International Journal of Control, Automation, and Systems.
Jianjun Ni*, Yu Gu, Yang Gu*, Yonghao Zhao, and Pengfei Shi
Hohai University
In response to the increasingly complex problem of patrolling urban areas, the utilization of deep reinforcement learning algorithms for autonomous unmanned aerial vehicle (UAV) coverage path planning (CPP) has gradually become a research hotspot. CPP’s solution needs to consider several complex factors, including landing area, target area coverage and limited battery capacity. Consequently, based on incomplete environmental information, policy learned by sample inefficient deep reinforcement learning algorithms are prone to getting trapped in local optima. To enhance the quality of experience data, a novel reward is proposed to guide UAVs in efficiently traversing the target area under battery limitations. Subsequently, to improve the sample efficiency of deep reinforcement learning algorithms, this paper introduces a novel dynamic soft update method, incorporates the prioritized experience replay mechanism, and presents an improved deep double Q-network (IDDQN) algorithm. Finally, simulation experiments conducted on two different grid maps demonstrate that IDDQN outperforms DDQN significantly. Our method simultaneously enhances the algorithm’s sample efficiency and safety performance, thereby enabling UAVs to cover a larger number of target areas.
Keywords: Battery limitation, coverage path planning, deep reinforcement learning, dynamic soft update.
Vol. 22, No. 12, pp. 3545~3811
Xuan Zheng, Shuaiming Yuan, and Pengzhan Chen*
International Journal of Control, Automation, and Systems 2024; 22(8): 2602-2612Huaishu Chen, Min-Cheol Kim, Yeongoh Ko, and Chang-Sei Kim*
International Journal of Control, Automation, and Systems 2023; 21(11): 3507-3518Yuxiang Sun*, Bo Yuan, Yongliang Zhang, Wanwen Zheng, Qingfeng Xia, Bojian Tang, and Xianzhong Zhou*
International Journal of Control, Automation and Systems 2021; 19(9): 2984-2998