Regular Papers

International Journal of Control, Automation, and Systems 2024; 22(5): 1461-1471

https://doi.org/10.1007/s12555-023-0211-3

© The International Journal of Control, Automation, and Systems

Push-sum Distributed Dual Averaging Online Convex Optimization With Bandit Feedback

Ju Yang, MengliWei, YanWang, and Zhongyuan Zhao*

zhaozhongyuan@nuist.edu.cn

Abstract

This paper investigates the distributed online convex optimization problem in multi-agent systems, where each node cannot directly access the gradient information of its own cost function. The communication topology is formed by the strongly connected time-varying directed graphs with the column stochastic weight matrices, where each node updates its own decisions by exchanging information with neighbouring nodes. It is not feasible to sample objective function values at several consecutive points simultaneously since the online setting is timevarying. To solve this problem over directed graphs, a push-sum one-point bandit distributed dual averaging (PSOBDDA) algorithm is proposed, where the one-point gradient estimator is employed to estimate the true gradient information, to guide the updating of the decision variables. Moreover, by selecting the appropriate exploration parameter δ and step sizes α(t), the algorithm is shown to achieve the sublinear regret bound with the convergencerate O(T56). Furthermore, the effect of one-point estimation parameters on the regret of the algorithm in online settings is explored. Finally, the performance of the algorithm is evaluated through simulation.

Keywords Bandit feedback, distributed online optimization, dual average, push-sum protocol.

Article

Regular Papers

International Journal of Control, Automation, and Systems 2024; 22(5): 1461-1471

Published online May 1, 2024 https://doi.org/10.1007/s12555-023-0211-3

Copyright © The International Journal of Control, Automation, and Systems.

Push-sum Distributed Dual Averaging Online Convex Optimization With Bandit Feedback

Ju Yang, MengliWei, YanWang, and Zhongyuan Zhao*

zhaozhongyuan@nuist.edu.cn

Abstract

This paper investigates the distributed online convex optimization problem in multi-agent systems, where each node cannot directly access the gradient information of its own cost function. The communication topology is formed by the strongly connected time-varying directed graphs with the column stochastic weight matrices, where each node updates its own decisions by exchanging information with neighbouring nodes. It is not feasible to sample objective function values at several consecutive points simultaneously since the online setting is timevarying. To solve this problem over directed graphs, a push-sum one-point bandit distributed dual averaging (PSOBDDA) algorithm is proposed, where the one-point gradient estimator is employed to estimate the true gradient information, to guide the updating of the decision variables. Moreover, by selecting the appropriate exploration parameter δ and step sizes α(t), the algorithm is shown to achieve the sublinear regret bound with the convergencerate O(T56). Furthermore, the effect of one-point estimation parameters on the regret of the algorithm in online settings is explored. Finally, the performance of the algorithm is evaluated through simulation.

Keywords: Bandit feedback, distributed online optimization, dual average, push-sum protocol.

IJCAS
May 2024

Vol. 22, No. 5, pp. 1461~1759

Stats or Metrics

Share this article on

  • line

IJCAS

eISSN 2005-4092
pISSN 1598-6446