Table of Contents
Fetching ...

Fractional Order Distributed Optimization

Andrei Lixandru, Marcel van Gerven, Sergio Pequito

TL;DR

The paper tackles slow and unstable convergence in distributed optimization on directed graphs with ill-conditioned objectives. It introduces FrODO, a framework that integrates fractional-order memory into local gradient updates, yielding updates of the form $x_i^{k+1} = x_i^k - \alpha g_i^k - \beta M_i^k$ where $M_i^k$ aggregates past gradients with a power-law weight. A convergence theorem shows linear convergence $O(\rho^k)$ for appropriate parameters and a memory effect captured by $C(\lambda)$, complemented by complexity analysis and experiments demonstrating substantial speedups in ill-conditioned problems (up to ~4x) and federated neural network training (2–3x) while preserving stability. The work provides practical guidelines for parameter choices (e.g., $\lambda$ around 0.1–0.2 and memory length $T\ge 80$) and suggests broader applicability to distributed control and multi-agent learning where long-term memory can stabilize optimization trajectories.

Abstract

Distributed optimization is fundamental to modern machine learning applications like federated learning, but existing methods often struggle with ill-conditioned problems and face stability-versus-speed tradeoffs. We introduce fractional order distributed optimization (FrODO); a theoretically-grounded framework that incorporates fractional-order memory terms to enhance convergence properties in challenging optimization landscapes. Our approach achieves provable linear convergence for any strongly connected network. Through empirical validation, our results suggest that FrODO achieves up to 4 times faster convergence versus baselines on ill-conditioned problems and 2-3 times speedup in federated neural network training, while maintaining stability and theoretical guarantees.

Fractional Order Distributed Optimization

TL;DR

The paper tackles slow and unstable convergence in distributed optimization on directed graphs with ill-conditioned objectives. It introduces FrODO, a framework that integrates fractional-order memory into local gradient updates, yielding updates of the form where aggregates past gradients with a power-law weight. A convergence theorem shows linear convergence for appropriate parameters and a memory effect captured by , complemented by complexity analysis and experiments demonstrating substantial speedups in ill-conditioned problems (up to ~4x) and federated neural network training (2–3x) while preserving stability. The work provides practical guidelines for parameter choices (e.g., around 0.1–0.2 and memory length ) and suggests broader applicability to distributed control and multi-agent learning where long-term memory can stabilize optimization trajectories.

Abstract

Distributed optimization is fundamental to modern machine learning applications like federated learning, but existing methods often struggle with ill-conditioned problems and face stability-versus-speed tradeoffs. We introduce fractional order distributed optimization (FrODO); a theoretically-grounded framework that incorporates fractional-order memory terms to enhance convergence properties in challenging optimization landscapes. Our approach achieves provable linear convergence for any strongly connected network. Through empirical validation, our results suggest that FrODO achieves up to 4 times faster convergence versus baselines on ill-conditioned problems and 2-3 times speedup in federated neural network training, while maintaining stability and theoretical guarantees.

Paper Structure

This paper contains 8 sections, 2 theorems, 3 equations, 1 figure, 1 algorithm.

Key Result

Theorem 2.1

Consider a network of $N$ agents with a strongly connected, directed communication graph $G$. Let each agent $i$ have a local objective function $f_i(x)$ that is $\mu$-strongly convex and $L$-smooth. For appropriate choices of parameters $\alpha$, $\beta$, and $T$, and fractional order $\lambda \in with a linear convergence rate $O(\rho^k)$, where $\rho = \max\{|1 - \alpha\mu|, |1 - \alpha L|\} \

Figures (1)

  • Figure 1: Results for Experiment 1 (left) and Experiment 2 (right).

Theorems & Definitions (4)

  • Theorem 2.1: Convergence of FrODO
  • proof
  • Theorem 2.2: Computational Complexity
  • proof