Table of Contents
Fetching ...

Quantization Avoids Saddle Points in Distributed Optimization

Yanan Bo, Yongqiang Wang

TL;DR

It is shown that quantization effects, which are unavoidable due to communications in distributed optimization and regarded as detrimental in existing studies, can be exploited to enable saddle-point avoidance for free.

Abstract

Distributed nonconvex optimization underpins key functionalities of numerous distributed systems, ranging from power systems, smart buildings, cooperative robots, vehicle networks to sensor networks. Recently, it has also merged as a promising solution to handle the enormous growth in data and model sizes in deep learning. A fundamental problem in distributed nonconvex optimization is avoiding convergence to saddle points, which significantly degrade optimization accuracy. We discover that the process of quantization, which is necessary for all digital communications, can be exploited to enable saddle-point avoidance. More specifically, we propose a stochastic quantization scheme and prove that it can effectively escape saddle points and ensure convergence to a second-order stationary point in distributed nonconvex optimization. With an easily adjustable quantization granularity, the approach allows a user to control the number of bits sent per iteration and, hence, to aggressively reduce the communication overhead. Numerical experimental results using distributed optimization and learning problems on benchmark datasets confirm the effectiveness of the approach.

Quantization Avoids Saddle Points in Distributed Optimization

TL;DR

It is shown that quantization effects, which are unavoidable due to communications in distributed optimization and regarded as detrimental in existing studies, can be exploited to enable saddle-point avoidance for free.

Abstract

Distributed nonconvex optimization underpins key functionalities of numerous distributed systems, ranging from power systems, smart buildings, cooperative robots, vehicle networks to sensor networks. Recently, it has also merged as a promising solution to handle the enormous growth in data and model sizes in deep learning. A fundamental problem in distributed nonconvex optimization is avoiding convergence to saddle points, which significantly degrade optimization accuracy. We discover that the process of quantization, which is necessary for all digital communications, can be exploited to enable saddle-point avoidance. More specifically, we propose a stochastic quantization scheme and prove that it can effectively escape saddle points and ensure convergence to a second-order stationary point in distributed nonconvex optimization. With an easily adjustable quantization granularity, the approach allows a user to control the number of bits sent per iteration and, hence, to aggressively reduce the communication overhead. Numerical experimental results using distributed optimization and learning problems on benchmark datasets confirm the effectiveness of the approach.
Paper Structure (20 sections, 5 theorems, 17 equations, 10 figures, 1 algorithm)

This paper contains 20 sections, 5 theorems, 17 equations, 10 figures, 1 algorithm.

Key Result

Lemma 1

For any $\boldsymbol{v} \in \mathbb{R}^d$, our quantization scheme $Q_\ell(\boldsymbol{v})=[ Q_\ell \left( v_1 \right), Q_\ell \left( v_2 \right),...,Q_\ell \left( v_d \right)]$ has the following properties:

Figures (10)

  • Figure 1: The proposed quantization scheme with quantization interval $\ell$. The star represents a value to be quantized, and it is located in the quantization interval of $[0, \ell]$ under level-set 1 and $[0.5\ell, 1.5 \ell]$ under level-set 2. At any even-number iteration ($k$ is even), the star value will be quantized to either 0 or $\ell$, with respective probabilities provided in [\ref{['Q_s1']}]. At any odd-number iteration ($k$ is odd), the star value will be quantized to either $0.5\ell$ or $1.5 \ell$, with respective probabilities given in [\ref{['Q_s2']}].
  • Figure 2: An illustrative example of the stepsizes. The two solid curves represent two reference functions which are defined on the continuous time $t$. The blue and orange dots represent the values of stepsizes $\varepsilon_k$ and $\eta_k$ at discrete time instants $k$ (which are periodic samples of the continuous time $t$). The time instants $t_0, t_1, t_2, t_3$ are determined in the second step of the stepsize strategy. Before $t_0$, the descent of the stepsize sequences is aligned with the reference functions. In intervals $[t_i, t_{i+1})$, the stepsizes remain constant, as described in the third step of the stepsize strategy.
  • Figure 3: Interaction weights of five agents
  • Figure 4: Trajectories of all five agents when initialized on the saddle point (0,0). Note that all trajectories overlap with each other, implying perfect consensus among the agents.
  • Figure 5: Comparison of the objective function value between the proposed Algorithm \ref{['alg:1']} and the existing algorithm DGD in yuan2016convergence.
  • ...and 5 more figures

Theorems & Definitions (7)

  • Definition 1
  • Definition 2
  • Lemma 1
  • Theorem 1
  • Theorem 2
  • Theorem 3: Escaping Saddle Points
  • Theorem 4