Table of Contents
Fetching ...

Nonlinear Perturbation-based Non-Convex Optimization over Time-Varying Networks

Mohammadreza Doostmohammadian, Zulfiya R. Gabidullina, Hamid R. Rabiee

TL;DR

The paper tackles distributed finite-sum optimization over time-varying networks with nonlinear, possibly non-ideal data exchanges. It introduces a single-timescale nonlinear perturbation-based gradient-tracking (NP-GT) algorithm that uses an auxiliary variable to achieve consensus while tracking the global gradient, even under log-scale quantization and link failures. A perturbation-based convergence analysis shows that, for sufficiently small step rate eta, the system's spectrum remains in the left half-plane except for the consensus-related zeros, guaranteeing convergence to the global optimizer. Extensive simulations on convex and non-convex problems, including time-varying graphs and link failures, demonstrate robustness and efficiency, with convergence rates linked to the network's algebraic connectivity. This work provides a theoretically grounded framework for reliable distributed optimization under realistic communication constraints applicable to federated learning, sensor networks, and multi-robot systems.

Abstract

Decentralized optimization strategies are helpful for various applications, from networked estimation to distributed machine learning. This paper studies finite-sum minimization problems described over a network of nodes and proposes a computationally efficient algorithm that solves distributed convex problems and optimally finds the solution to locally non-convex objective functions. In contrast to batch gradient optimization in some literature, our algorithm is on a single-time scale with no extra inner consensus loop. It evaluates one gradient entry per node per time. Further, the algorithm addresses link-level nonlinearity representing, for example, logarithmic quantization of the exchanged data or clipping of the exchanged data bits. Leveraging perturbation-based theory and algebraic Laplacian network analysis proves optimal convergence and dynamics stability over time-varying and switching networks. The time-varying network setup might be due to packet drops or link failures. Despite the nonlinear nature of the dynamics, we prove exact convergence in the face of odd sign-preserving sector-bound nonlinear data transmission over the links. Illustrative numerical simulations further highlight our contributions.

Nonlinear Perturbation-based Non-Convex Optimization over Time-Varying Networks

TL;DR

The paper tackles distributed finite-sum optimization over time-varying networks with nonlinear, possibly non-ideal data exchanges. It introduces a single-timescale nonlinear perturbation-based gradient-tracking (NP-GT) algorithm that uses an auxiliary variable to achieve consensus while tracking the global gradient, even under log-scale quantization and link failures. A perturbation-based convergence analysis shows that, for sufficiently small step rate eta, the system's spectrum remains in the left half-plane except for the consensus-related zeros, guaranteeing convergence to the global optimizer. Extensive simulations on convex and non-convex problems, including time-varying graphs and link failures, demonstrate robustness and efficiency, with convergence rates linked to the network's algebraic connectivity. This work provides a theoretically grounded framework for reliable distributed optimization under realistic communication constraints applicable to federated learning, sensor networks, and multi-robot systems.

Abstract

Decentralized optimization strategies are helpful for various applications, from networked estimation to distributed machine learning. This paper studies finite-sum minimization problems described over a network of nodes and proposes a computationally efficient algorithm that solves distributed convex problems and optimally finds the solution to locally non-convex objective functions. In contrast to batch gradient optimization in some literature, our algorithm is on a single-time scale with no extra inner consensus loop. It evaluates one gradient entry per node per time. Further, the algorithm addresses link-level nonlinearity representing, for example, logarithmic quantization of the exchanged data or clipping of the exchanged data bits. Leveraging perturbation-based theory and algebraic Laplacian network analysis proves optimal convergence and dynamics stability over time-varying and switching networks. The time-varying network setup might be due to packet drops or link failures. Despite the nonlinear nature of the dynamics, we prove exact convergence in the face of odd sign-preserving sector-bound nonlinear data transmission over the links. Illustrative numerical simulations further highlight our contributions.
Paper Structure (11 sections, 4 theorems, 31 equations, 9 figures, 1 algorithm)

This paper contains 11 sections, 4 theorems, 31 equations, 9 figures, 1 algorithm.

Key Result

Lemma 1

stewart_bookcai2012average Consider matrix $A(\eta)$ of size $n$ which smoothly depends on variable $\eta \geq 0$. Let $l \in \{1,\dots,n\}$ and $\lambda_1,\dots,\lambda_l$ be semi-simple eigenvalues of matrix $A^0$, with (linearly independent) right and left eigenvectors $\mathbf{v}_1,\dots,\mathbf Let $\lambda_i(\eta)$ be the $i$th eigenvalue of $A(\eta)$ corresponding to $\lambda_i$ as the $i$t

Figures (9)

  • Figure 1: The left figure shows an undirected network (stochastic and WB) with an unreliable red-colored link. After the red link fails, the resulting network is not stochastic anymore but is still WB. This shows that the WB condition is milder than the stochastic condition. Therefore, our optimization solution converges over such unreliable networks, while much of the existing literature does not converge without redesigning the stochastic weights.
  • Figure 2: This figure shows the logarithmic quantization as an example of nonlinear mapping satisfying the sector-bound condition. For the quantization level $\rho$ the lines $1 \pm \frac{\rho}{2}$ bound the nonlinear function.
  • Figure 3: This figure shows the perturbation-based bound on the optimal matching distance between the eigenspectrum of $A_h$ and $A_h^0$. By having $d(\sigma(A_h),\sigma(A^0_h)) < \kappa |\operatorname{Re}\{\lambda_{3,j}(\eta,t)\}|$ the perturbation theory guarantees that the nonzero eigenvalues of $A_h$ remain in the LHP.
  • Figure 4: This figure shows the convergence of the local regressor parameters to the optimal centralized linear regressor. As illustrated, the agents reach consensus on the regression parameters $\beta_i,\nu_i$.
  • Figure 5: This figure compares the optimality gap of the objective function \ref{['eq_fi']} (linear regression) under different distributed optimization techniques. Assuming heterogeneous setups, each agent can access its local batch of data.
  • ...and 4 more figures

Theorems & Definitions (9)

  • Remark 1
  • Lemma 1
  • Lemma 2
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Remark 2
  • Remark 3