Federated Frank-Wolfe Algorithm

Ali Dadras; Sourasekhar Banerjee; Karthik Prakhya; Alp Yurtsever

Federated Frank-Wolfe Algorithm

Ali Dadras, Sourasekhar Banerjee, Karthik Prakhya, Alp Yurtsever

TL;DR

This work addresses constrained optimization in Federated Learning by introducing FedFW, a projection-free Frank-Wolfe-based method that replaces hard consensus with a smooth quadratic penalty and relies on a linear minimization oracle to update per-client directions. The approach yields convergence guarantees for smooth convex objectives at rate O(t^{-1/2}) and for non-convex objectives via the FW gap at rate O(t^{-1/3}), with a stochastic variant achieving O(t^{-1/3}) in the convex setting. FedFW preserves data privacy by communicating only LMO outputs, which correspond to sparse or low-rank signals, and supports practical enhancements such as stochastic gradients, partial participation, straggler-aware constraints, and an augmented Lagrangian version (FedFW+). Empirical results on convex MCLR and non-convex CNN/DNN tasks across IID and non-IID data demonstrate competitive performance with reduced communication, highlighting FedFW’s practicality for scalable, privacy-preserving constrained FL.

Abstract

Federated learning (FL) has gained a lot of attention in recent years for building privacy-preserving collaborative learning systems. However, FL algorithms for constrained machine learning problems are still limited, particularly when the projection step is costly. To this end, we propose a Federated Frank-Wolfe Algorithm (FedFW). FedFW features data privacy, low per-iteration cost, and communication of sparse signals. In the deterministic setting, FedFW achieves an $\varepsilon$-suboptimal solution within $O(\varepsilon^{-2})$ iterations for smooth and convex objectives, and $O(\varepsilon^{-3})$ iterations for smooth but non-convex objectives. Furthermore, we present a stochastic variant of FedFW and show that it finds a solution within $O(\varepsilon^{-3})$ iterations in the convex setting. We demonstrate the empirical performance of FedFW on several machine learning tasks.

Federated Frank-Wolfe Algorithm

TL;DR

Abstract

-suboptimal solution within

iterations for smooth and convex objectives, and

iterations for smooth but non-convex objectives. Furthermore, we present a stochastic variant of FedFW and show that it finds a solution within

iterations in the convex setting. We demonstrate the empirical performance of FedFW on several machine learning tasks.

Paper Structure (30 sections, 6 theorems, 71 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 30 sections, 6 theorems, 71 equations, 3 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Federated Learning.
Frank-Wolfe Algorithm.
Federated Frank-Wolfe Algorithm
Convergence Guarantees
Privacy and Communication Benefits
Design Variants of FedFW
FedFW with stochastic gradients
FedFW with partial client participation
FedFW with split constraints for stragglers
FedFW with augmented Lagrangian
Numerical Experiments
Datasets.
Comparison of algorithms in the convex setting
...and 15 more sections

Key Result

Theorem 1

Consider problem eqn:sec:intro:problem def with $L$-smooth and convex loss functions $f_i$. Then, estimation $\bar{\mathbf{x}}^t$ generated by FedFW with step-size $\eta_t = \frac{2}{t+1}$ and penalty parameter $\lambda_t = \lambda_0 \sqrt{t+1}$ for any $\lambda_0 > 0$ satisfies

Figures (3)

Figure 1: Privacy benefits of sharing linear minimization outputs vs gradients. The Deep Leakage Algorithm can recover CIFAR-100 data points from shared gradients. Sharing linear minimization outputs enhances privacy. (a) and (b) compares reconstructions from gradients and LMO outputs with $\ell_2$ and $\ell_1$-norm ball constraints after $10^5$ iterations for two different data points. (c) and (d) present the reconstruction PSNR as a function of iterations for the corresponding images.
Figure 2: Effect of participation $\texttt{p}$ on FedFW. The experiment was conducted with MCLR using synthetic data, an $\ell_1$ constraint, and two different choices of $\lambda_0$.
Figure 4: Effect of the initial penalty ($\lambda_0$) on FedFW. (a) and (b) show the results for the convex setting, (c) and (d) demonstrates the non-convex setting.

Theorems & Definitions (11)

Theorem 1
Remark 1
Theorem 2
Remark 2
Theorem 3
Remark 3
Lemma 1
proof
Lemma 2: Boundedness of the gradient
proof
...and 1 more

Federated Frank-Wolfe Algorithm

TL;DR

Abstract

Federated Frank-Wolfe Algorithm

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (11)