Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency
Qixin Zhang, Zongqi Wan, Yu Yang, Li Shen, Dacheng Tao
TL;DR
The paper addresses online coordination of multiple agents with submodular objectives under uncertain dynamics. It introduces MA-OSMA and MA-OSEA, which leverage a multi-linear extension and a curvature-aware surrogate gradient to achieve the tight $\frac{1-e^{-c}}{c}$-approximation on connected networks while reducing the need for a fully connected communication graph. The authors establish dynamic regret bounds $\widetilde{O}(\sqrt{C_T T/(1-\beta)})$ against the corresponding surrogate-approximation benchmark and validate performance via simulations in multi-target tracking, highlighting improved efficiency and scalability. The work advances decentralized online submodular optimization by combining consensus-based updates, projection-free strategies, and surrogate-gradient techniques, with practical implications for real-world multi-agent systems requiring communication-efficient coordination.
Abstract
Coordinating multiple agents to collaboratively maximize submodular functions in unpredictable environments is a critical task with numerous applications in machine learning, robot planning and control. The existing approaches, such as the OSG algorithm, are often hindered by their poor approximation guarantees and the rigid requirement for a fully connected communication graph. To address these challenges, we firstly present a $\textbf{MA-OSMA}$ algorithm, which employs the multi-linear extension to transfer the discrete submodular maximization problem into a continuous optimization, thereby allowing us to reduce the strict dependence on a complete graph through consensus techniques. Moreover, $\textbf{MA-OSMA}$ leverages a novel surrogate gradient to avoid sub-optimal stationary points. To eliminate the computationally intensive projection operations in $\textbf{MA-OSMA}$, we also introduce a projection-free $\textbf{MA-OSEA}$ algorithm, which effectively utilizes the KL divergence by mixing a uniform distribution. Theoretically, we confirm that both algorithms achieve a regret bound of $\widetilde{O}(\sqrt{\frac{C_{T}T}{1-β}})$ against a $(\frac{1-e^{-c}}{c})$-approximation to the best comparator in hindsight, where $C_{T}$ is the deviation of maximizer sequence, $β$ is the spectral gap of the network and $c$ is the joint curvature of submodular objectives. This result significantly improves the $(\frac{1}{1+c})$-approximation provided by the state-of-the-art OSG algorithm. Finally, we demonstrate the effectiveness of our proposed algorithms through simulation-based multi-target tracking.
