Table of Contents
Fetching ...

FedOSAA: Improving Federated Learning with One-Step Anderson Acceleration

Xue Feng, M. Paul Laiu, Thomas Strohmer

TL;DR

This work tackles slow convergence in federated learning caused by client drift and heterogeneous data. It introduces FedOSAA, which augments variance-reduced local updates with a one-step Anderson acceleration to capture curvature without Hessian access, effectively approximating the Newton-GMRES direction. The authors prove local linear convergence for smooth, strongly convex losses and show via experiments that FedOSAA accelerates convergence and matches Hessian-based methods like GIANT under realistic settings. The approach offers a simple, Hessian-free path to faster distributed optimization with strong practical potential for communication- and computation-efficient FL.

Abstract

Federated learning (FL) is a distributed machine learning approach that enables multiple local clients and a central server to collaboratively train a model while keeping the data on their own devices. First-order methods, particularly those incorporating variance reduction techniques, are the most widely used FL algorithms due to their simple implementation and stable performance. However, these methods tend to be slow and require a large number of communication rounds to reach the global minimizer. We propose FedOSAA, a novel approach that preserves the simplicity of first-order methods while achieving the rapid convergence typically associated with second-order methods. Our approach applies one Anderson acceleration (AA) step following classical local updates based on first-order methods with variance reduction, such as FedSVRG and SCAFFOLD, during local training. This AA step is able to leverage curvature information from the history points and gives a new update that approximates the Newton-GMRES direction, thereby significantly improving the convergence. We establish a local linear convergence rate to the global minimizer of FedOSAA for smooth and strongly convex loss functions. Numerical comparisons show that FedOSAA substantially improves the communication and computation efficiency of the original first-order methods, achieving performance comparable to second-order methods like GIANT.

FedOSAA: Improving Federated Learning with One-Step Anderson Acceleration

TL;DR

This work tackles slow convergence in federated learning caused by client drift and heterogeneous data. It introduces FedOSAA, which augments variance-reduced local updates with a one-step Anderson acceleration to capture curvature without Hessian access, effectively approximating the Newton-GMRES direction. The authors prove local linear convergence for smooth, strongly convex losses and show via experiments that FedOSAA accelerates convergence and matches Hessian-based methods like GIANT under realistic settings. The approach offers a simple, Hessian-free path to faster distributed optimization with strong practical potential for communication- and computation-efficient FL.

Abstract

Federated learning (FL) is a distributed machine learning approach that enables multiple local clients and a central server to collaboratively train a model while keeping the data on their own devices. First-order methods, particularly those incorporating variance reduction techniques, are the most widely used FL algorithms due to their simple implementation and stable performance. However, these methods tend to be slow and require a large number of communication rounds to reach the global minimizer. We propose FedOSAA, a novel approach that preserves the simplicity of first-order methods while achieving the rapid convergence typically associated with second-order methods. Our approach applies one Anderson acceleration (AA) step following classical local updates based on first-order methods with variance reduction, such as FedSVRG and SCAFFOLD, during local training. This AA step is able to leverage curvature information from the history points and gives a new update that approximates the Newton-GMRES direction, thereby significantly improving the convergence. We establish a local linear convergence rate to the global minimizer of FedOSAA for smooth and strongly convex loss functions. Numerical comparisons show that FedOSAA substantially improves the communication and computation efficiency of the original first-order methods, achieving performance comparable to second-order methods like GIANT.

Paper Structure

This paper contains 21 sections, 3 theorems, 30 equations, 8 figures, 1 table, 2 algorithms.

Key Result

Lemma 3

Under Assumptions assumption and assumption2, given that $\|\nabla f(\boldsymbol{w}^t)\|$ is sufficiently small, we have $\delta^t_k \approx \sqrt{1 - \mu \eta} \, \theta^t_k \leq 1$. Further, if $f_k$ is quadratic, $\delta^t_k = \sqrt{1 - \mu \eta} \, \theta^t_k \leq 1$.

Figures (8)

  • Figure 1: Comparative analysis on the covtype dataset by varying the local learning rate $\eta$ (first column), the number of local epochs $L$ (second column), and the batch size $B_k$ (third column). The first row compares FedOSAA-SVRG with FedSVRG and Newton-GMRES, while the second row compares FedOSAA-SCAFFOLD with SCAFFOLD. The number of clients is set to $K = 100$, with $N_k = 5810$ data points on each client.
  • Figure 2: Comparative analysis on the different datasets and data distributions. We set $K=10$.
  • Figure 3: Comparation of FedAVG and FedOSAA-AVG on the Covtype dataset by varying the local learning rate $\eta$, the number of local epochs $L$ (second column) The number of clients is set to $K = 100$, with each client having $N_k = 5810$ data points.
  • Figure 4: Comparison test on covtype dataset under different $\gamma$ and number of clients $K$. The first row is with a fixed $\gamma=0.01$. The second row is with a fixed local clients $K=100$.
  • Figure 5: Comparison test on w8a dataset. The first row is fixed $\gamma=0.01$. The second row is with a fixed local clients $K=16$.
  • ...and 3 more figures

Theorems & Definitions (5)

  • Lemma 3
  • Theorem 4: Quadratic loss
  • Theorem 5: General loss
  • proof
  • proof