On Linear Convergence of Distributed Stochastic Bilevel Optimization over Undirected Networks via Gradient Aggregation
Ajay Tak, Mayank Baranwal
TL;DR
This work tackles bilevel distributed optimization over undirected networks by introducing the BDASG algorithm, which couples gradient aggregation with consensus to solve distributed bilevel problems without a central coordinator. The main theoretical contribution is proving linear convergence in expectation to a neighborhood of the global optimum when the aggregate objective $f(x)=\sum_{i=1}^n f_i(x)$ is strongly convex, relaxing requirements on local convexity; the authors also discuss plausibility of linear convergence under the $PL$ condition. The analysis decouples consensus and optimization by proving a gradient-reduction property for $Y$ and a PL-based contraction for the averaged iterate $\bar{x}$, with the contraction rate controlled by the network spectral gap $\sigma_2$ and the step size. Numerical experiments on distributed sensor networks and rank-deficient distributed linear regression validate the method, showing robust, scalable performance on undirected networks. The work thus broadens applicability of distributed bilevel optimization with minimal structural assumptions and provides a practical algorithm for large-scale networked learning and sensing.
Abstract
Many large-scale constrained optimization problems can be formulated as bilevel distributed optimization tasks over undirected networks, where agents collaborate to minimize a global cost function while adhering to constraints, relying only on local communication and computation. In this work, we propose a distributed stochastic gradient aggregation scheme and establish its linear convergence under the weak assumption of global strong convexity, which relaxes the common requirement of local function convexity on the objective and constraint functions. Specifically, we prove that the algorithm converges at a linear rate when the global objective function (and not each local objective function) satisfies strong-convexity. Our results significantly extend existing theoretical guarantees for distributed bilevel optimization. Additionally, we demonstrate the effectiveness of our approach through numerical experiments on distributed sensor network problems and distributed linear regression with rank-deficient data.
