Table of Contents
Fetching ...

Federated Frank-Wolfe Algorithm

Ali Dadras, Sourasekhar Banerjee, Karthik Prakhya, Alp Yurtsever

TL;DR

This work addresses constrained optimization in Federated Learning by introducing FedFW, a projection-free Frank-Wolfe-based method that replaces hard consensus with a smooth quadratic penalty and relies on a linear minimization oracle to update per-client directions. The approach yields convergence guarantees for smooth convex objectives at rate O(t^{-1/2}) and for non-convex objectives via the FW gap at rate O(t^{-1/3}), with a stochastic variant achieving O(t^{-1/3}) in the convex setting. FedFW preserves data privacy by communicating only LMO outputs, which correspond to sparse or low-rank signals, and supports practical enhancements such as stochastic gradients, partial participation, straggler-aware constraints, and an augmented Lagrangian version (FedFW+). Empirical results on convex MCLR and non-convex CNN/DNN tasks across IID and non-IID data demonstrate competitive performance with reduced communication, highlighting FedFW’s practicality for scalable, privacy-preserving constrained FL.

Abstract

Federated learning (FL) has gained a lot of attention in recent years for building privacy-preserving collaborative learning systems. However, FL algorithms for constrained machine learning problems are still limited, particularly when the projection step is costly. To this end, we propose a Federated Frank-Wolfe Algorithm (FedFW). FedFW features data privacy, low per-iteration cost, and communication of sparse signals. In the deterministic setting, FedFW achieves an $\varepsilon$-suboptimal solution within $O(\varepsilon^{-2})$ iterations for smooth and convex objectives, and $O(\varepsilon^{-3})$ iterations for smooth but non-convex objectives. Furthermore, we present a stochastic variant of FedFW and show that it finds a solution within $O(\varepsilon^{-3})$ iterations in the convex setting. We demonstrate the empirical performance of FedFW on several machine learning tasks.

Federated Frank-Wolfe Algorithm

TL;DR

This work addresses constrained optimization in Federated Learning by introducing FedFW, a projection-free Frank-Wolfe-based method that replaces hard consensus with a smooth quadratic penalty and relies on a linear minimization oracle to update per-client directions. The approach yields convergence guarantees for smooth convex objectives at rate O(t^{-1/2}) and for non-convex objectives via the FW gap at rate O(t^{-1/3}), with a stochastic variant achieving O(t^{-1/3}) in the convex setting. FedFW preserves data privacy by communicating only LMO outputs, which correspond to sparse or low-rank signals, and supports practical enhancements such as stochastic gradients, partial participation, straggler-aware constraints, and an augmented Lagrangian version (FedFW+). Empirical results on convex MCLR and non-convex CNN/DNN tasks across IID and non-IID data demonstrate competitive performance with reduced communication, highlighting FedFW’s practicality for scalable, privacy-preserving constrained FL.

Abstract

Federated learning (FL) has gained a lot of attention in recent years for building privacy-preserving collaborative learning systems. However, FL algorithms for constrained machine learning problems are still limited, particularly when the projection step is costly. To this end, we propose a Federated Frank-Wolfe Algorithm (FedFW). FedFW features data privacy, low per-iteration cost, and communication of sparse signals. In the deterministic setting, FedFW achieves an -suboptimal solution within iterations for smooth and convex objectives, and iterations for smooth but non-convex objectives. Furthermore, we present a stochastic variant of FedFW and show that it finds a solution within iterations in the convex setting. We demonstrate the empirical performance of FedFW on several machine learning tasks.
Paper Structure (30 sections, 6 theorems, 71 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 30 sections, 6 theorems, 71 equations, 3 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Consider problem eqn:sec:intro:problem def with $L$-smooth and convex loss functions $f_i$. Then, estimation $\bar{\mathbf{x}}^t$ generated by FedFW with step-size $\eta_t = \frac{2}{t+1}$ and penalty parameter $\lambda_t = \lambda_0 \sqrt{t+1}$ for any $\lambda_0 > 0$ satisfies

Figures (3)

  • Figure 1: Privacy benefits of sharing linear minimization outputs vs gradients. The Deep Leakage Algorithm can recover CIFAR-100 data points from shared gradients. Sharing linear minimization outputs enhances privacy. (a) and (b) compares reconstructions from gradients and LMO outputs with $\ell_2$ and $\ell_1$-norm ball constraints after $10^5$ iterations for two different data points. (c) and (d) present the reconstruction PSNR as a function of iterations for the corresponding images.
  • Figure 2: Effect of participation $\texttt{p}$ on FedFW. The experiment was conducted with MCLR using synthetic data, an $\ell_1$ constraint, and two different choices of $\lambda_0$.
  • Figure 4: Effect of the initial penalty ($\lambda_0$) on FedFW. (a) and (b) show the results for the convex setting, (c) and (d) demonstrates the non-convex setting.

Theorems & Definitions (11)

  • Theorem 1
  • Remark 1
  • Theorem 2
  • Remark 2
  • Theorem 3
  • Remark 3
  • Lemma 1
  • proof
  • Lemma 2: Boundedness of the gradient
  • proof
  • ...and 1 more