Rate Analysis of Coupled Distributed Stochastic Approximation for Misspecified Optimization

Yaqun Yang; Jinlong Lei

Rate Analysis of Coupled Distributed Stochastic Approximation for Misspecified Optimization

Yaqun Yang, Jinlong Lei

TL;DR

This work tackles distributed optimization under parametric misspecification by introducing a Coupled Distributed Stochastic Approximation (CDSA) algorithm that jointly updates decision variables and learned parameters through local stochastic gradients and network-wide consensus. The authors derive explicit convergence guarantees, showing the mean-squared error of the decision variable decays at a network- and iteration-dependent rate: $\mathcal{O}\left(\frac{1}{nk}\right) + \mathcal{O}\left(\frac{1}{\sqrt{n}(1-\rho_w)}\right)\frac{1}{k^{1.5}} + \mathcal{O}\left(\frac{1}{(1-\rho_w)^2}\right)\frac{1}{k^2}$, and identify a transient time of $K_T=\mathcal{O}\left(\frac{n}{(1-\rho_w)^2}\right)$. The analysis isolates the baseline network-independent optimization effect from higher-order network-structure effects, showing that improved connectivity primarily accelerates the later terms. Empirical tests on ridge and logistic regression with various network topologies corroborate the theory and demonstrate practical applicability in distributed CPU-based settings.

Abstract

We consider an $n$ agents distributed optimization problem with imperfect information characterized in a parametric sense, where the unknown parameter can be solved by a distinct distributed parameter learning problem. Though each agent only has access to its local parameter learning and computational problem, they mean to collaboratively minimize the average of their local cost functions. To address the special optimization problem, we propose a coupled distributed stochastic approximation algorithm, in which every agent updates the current beliefs of its unknown parameter and decision variable by stochastic approximation method; and then averages the beliefs and decision variables of its neighbors over network in consensus protocol. Our interest lies in the convergence analysis of this algorithm. We quantitatively characterize the factors that affect the algorithm performance, and prove that the mean-squared error of the decision variable is bounded by $\mathcal{O}(\frac{1}{nk})+\mathcal{O}\left(\frac{1}{\sqrt{n}(1-ρ_w)}\right)\frac{1}{k^{1.5}}+\mathcal{O}\big(\frac{1}{(1-ρ_w)^2} \big)\frac{1}{k^2}$, where $k$ is the iteration count and $(1-ρ_w)$ is the spectral gap of the network weighted adjacency matrix. It reveals that the network connectivity characterized by $(1-ρ_w)$ only influences the high order of convergence rate, while the domain rate still acts the same as the centralized algorithm. In addition, we analyze that the transient iteration needed for reaching its dominant rate $\mathcal{O}(\frac{1}{nk})$ is $\mathcal{O}(\frac{n}{(1-ρ_w)^2})$. Numerical experiments are carried out to demonstrate the theoretical results by taking different CPUs as agents, which is more applicable to real-world distributed scenarios.

Rate Analysis of Coupled Distributed Stochastic Approximation for Misspecified Optimization

TL;DR

, and identify a transient time of

. The analysis isolates the baseline network-independent optimization effect from higher-order network-structure effects, showing that improved connectivity primarily accelerates the later terms. Empirical tests on ridge and logistic regression with various network topologies corroborate the theory and demonstrate practical applicability in distributed CPU-based settings.

Abstract

We consider an

agents distributed optimization problem with imperfect information characterized in a parametric sense, where the unknown parameter can be solved by a distinct distributed parameter learning problem. Though each agent only has access to its local parameter learning and computational problem, they mean to collaboratively minimize the average of their local cost functions. To address the special optimization problem, we propose a coupled distributed stochastic approximation algorithm, in which every agent updates the current beliefs of its unknown parameter and decision variable by stochastic approximation method; and then averages the beliefs and decision variables of its neighbors over network in consensus protocol. Our interest lies in the convergence analysis of this algorithm. We quantitatively characterize the factors that affect the algorithm performance, and prove that the mean-squared error of the decision variable is bounded by

, where

is the iteration count and

is the spectral gap of the network weighted adjacency matrix. It reveals that the network connectivity characterized by

only influences the high order of convergence rate, while the domain rate still acts the same as the centralized algorithm. In addition, we analyze that the transient iteration needed for reaching its dominant rate

. Numerical experiments are carried out to demonstrate the theoretical results by taking different CPUs as agents, which is more applicable to real-world distributed scenarios.

Paper Structure (25 sections, 14 theorems, 103 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 25 sections, 14 theorems, 103 equations, 3 figures, 2 tables, 1 algorithm.

Introduction
Problem Formulation
Prior Work
Gaps and Motivation
Outline and Contributions
Algorithm and Assumptions
Algorithm Set Up
Assumptions
Auxiliary Results
Preliminary Bound
Supporting Lemmas
Uniform Bound
Main Results
Sublinear Convergence
Rate Estimate
...and 10 more sections

Key Result

Lemma 2.2.1

qu2017harnessing For any $x\in\mathbb{R}^p$, define $x^+=x-\alpha \nabla f(x)$. Suppose that $f$ is strongly convex with constant $\mu$ and its gradient function is Lipschitz continuous with constant $L$. If $\alpha\in(0,2/L)$, we then have $||x^+-x_*||\leq \lambda||x-x_*||,$ where $\lambda\triangle

Figures (3)

Figure 1: The problem setup: a connected network of communicating agents, where each agent preserving a local learning problem $h_i$ and computational problem $f_i$ correlated with $h_i$ through the unknown parameter $\theta$, while they cooperate to solve the distributed coupled optimization problem.
Figure 2: The performance of CDSA between path graph and complete graph topology. The results are averaged over 200 Monte Carlo sampling.
Figure 3: The performance of CDSA of $25$ agents under four topologies in \ref{['table']} for binary classification via logistic regression. The results are averaged over 200 Monte Carlo sampling.

Theorems & Definitions (16)

Lemma 2.2.1
Lemma 2.2.2
Lemma 3.1.1
Lemma 3.1.2
Lemma 3.2.1
Lemma 3.2.2
Remark 1
Lemma 3.3.1
Lemma 3.3.2
Lemma 4.1.1
...and 6 more

Rate Analysis of Coupled Distributed Stochastic Approximation for Misspecified Optimization

TL;DR

Abstract

Rate Analysis of Coupled Distributed Stochastic Approximation for Misspecified Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (16)