International Journal of Control, Automation, and Systems 2024; 22(5): 1461-1471
https://doi.org/10.1007/s12555-023-0211-3
© The International Journal of Control, Automation, and Systems
This paper investigates the distributed online convex optimization problem in multi-agent systems, where each node cannot directly access the gradient information of its own cost function. The communication topology is formed by the strongly connected time-varying directed graphs with the column stochastic weight matrices, where each node updates its own decisions by exchanging information with neighbouring nodes. It is not feasible to sample objective function values at several consecutive points simultaneously since the online setting is timevarying. To solve this problem over directed graphs, a push-sum one-point bandit distributed dual averaging (PSOBDDA) algorithm is proposed, where the one-point gradient estimator is employed to estimate the true gradient information, to guide the updating of the decision variables. Moreover, by selecting the appropriate exploration parameter δ and step sizes α(t), the algorithm is shown to achieve the sublinear regret bound with the convergencerate O(T56). Furthermore, the effect of one-point estimation parameters on the regret of the algorithm in online settings is explored. Finally, the performance of the algorithm is evaluated through simulation.
Keywords Bandit feedback, distributed online optimization, dual average, push-sum protocol.
International Journal of Control, Automation, and Systems 2024; 22(5): 1461-1471
Published online May 1, 2024 https://doi.org/10.1007/s12555-023-0211-3
Copyright © The International Journal of Control, Automation, and Systems.
Ju Yang, MengliWei, YanWang, and Zhongyuan Zhao*
zhaozhongyuan@nuist.edu.cn
This paper investigates the distributed online convex optimization problem in multi-agent systems, where each node cannot directly access the gradient information of its own cost function. The communication topology is formed by the strongly connected time-varying directed graphs with the column stochastic weight matrices, where each node updates its own decisions by exchanging information with neighbouring nodes. It is not feasible to sample objective function values at several consecutive points simultaneously since the online setting is timevarying. To solve this problem over directed graphs, a push-sum one-point bandit distributed dual averaging (PSOBDDA) algorithm is proposed, where the one-point gradient estimator is employed to estimate the true gradient information, to guide the updating of the decision variables. Moreover, by selecting the appropriate exploration parameter δ and step sizes α(t), the algorithm is shown to achieve the sublinear regret bound with the convergencerate O(T56). Furthermore, the effect of one-point estimation parameters on the regret of the algorithm in online settings is explored. Finally, the performance of the algorithm is evaluated through simulation.
Keywords: Bandit feedback, distributed online optimization, dual average, push-sum protocol.
Vol. 23, No. 1, pp. 1~88