Decentralized Finite-Sum Optimization over Time-Varying Networks

Dmitry Metelev; Savelii Chezhegov; Alexander Rogozin; Aleksandr Beznosikov; Alexander Sholokhov; Alexander Gasnikov; Dmitry Kovalev

Decentralized Finite-Sum Optimization over Time-Varying Networks

Dmitry Metelev, Savelii Chezhegov, Alexander Rogozin, Aleksandr Beznosikov, Alexander Sholokhov, Alexander Gasnikov, Dmitry Kovalev

TL;DR

The paper studies decentralized finite-sum optimization over time-varying networks, addressing both strongly convex and nonconvex objectives. It introduces ADOM+VR for strongly convex problems and GT-PAGE for nonconvex problems, integrating variance reduction with gradient tracking to achieve improved convergence under time-varying topologies. It provides new lower bounds on both communication and stochastic oracle complexity, clarifying fundamental limits and the potential and limits of proposed algorithms in comparison to static-network results. Theoretical guarantees are complemented by numerical experiments on LibSVM datasets, illustrating the practical effectiveness and trade-offs of the proposed methods in decentralized, changing networks.

Abstract

We consider decentralized time-varying stochastic optimization problems where each of the functions held by the nodes has a finite sum structure. Such problems can be efficiently solved using variance reduction techniques. Our aim is to explore the lower complexity bounds (for communication and number of stochastic oracle calls) and find optimal algorithms. The paper studies strongly convex and nonconvex scenarios. To the best of our knowledge, variance reduced schemes and lower bounds for time-varying graphs have not been studied in the literature. For nonconvex objectives, we obtain lower bounds and develop an optimal method GT-PAGE. For strongly convex objectives, we propose the first decentralized time-varying variance-reduction method ADOM+VR and establish lower bound in this scenario, highlighting the open question of matching the algorithms complexity and lower bounds even in static network case.

Decentralized Finite-Sum Optimization over Time-Varying Networks

TL;DR

Abstract

Paper Structure (24 sections, 27 theorems, 251 equations, 2 figures, 2 tables, 2 algorithms)

This paper contains 24 sections, 27 theorems, 251 equations, 2 figures, 2 tables, 2 algorithms.

Introduction
Decentralized optimization
Variance reduction
Decentralized optimization and Variance reduction
Related Work
Notation and Assumptions
Algorithms
Strongly Convex Case
Nonconvex Case
Lower Bounds
First-order Decentralized Algorithms
Strongly Convex Case
Nonconvex Case
Numerical experiments
Setup
...and 9 more sections

Key Result

Theorem 4.1

Let Assumptions assum:smoothness_f, assum:smoothness_F, assum:strong_convexity, assum:gossip_matrix_sequence and $b \geq \overline{L}/L$ hold. Then Algorithm scary:alg requires $N$ iterations to yield $x^N$ such that $\|x^N - x^*\|^2\leq \varepsilon$, where

Figures (2)

Figure 1: Comparison of communication and oracle complexities of Algorithm \ref{['scary:alg']} (ADOM+VR), ADOM+, Accelerated-GT (Acc-GT) and Accelerated-VR-Extra (Acc-VR-Extra) on logistic regression problem on LibSVM datasets.
Figure 2: Comparison of communication and oracle complexities of Algorithm \ref{['alg:yup']} (GT-PAGE), GT-SARAH and DESTRESS.

Theorems & Definitions (61)

Remark 2.1
Remark 3.4
Theorem 4.1
Corollary 4.2
Theorem 4.3
Remark 4.4
Corollary 4.5
Remark 4.6
Theorem 5.3
proof
...and 51 more

Decentralized Finite-Sum Optimization over Time-Varying Networks

TL;DR

Abstract

Decentralized Finite-Sum Optimization over Time-Varying Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (61)