Federated Frank-Wolfe Algorithm
Ali Dadras, Sourasekhar Banerjee, Karthik Prakhya, Alp Yurtsever
TL;DR
This work addresses constrained optimization in Federated Learning by introducing FedFW, a projection-free Frank-Wolfe-based method that replaces hard consensus with a smooth quadratic penalty and relies on a linear minimization oracle to update per-client directions. The approach yields convergence guarantees for smooth convex objectives at rate O(t^{-1/2}) and for non-convex objectives via the FW gap at rate O(t^{-1/3}), with a stochastic variant achieving O(t^{-1/3}) in the convex setting. FedFW preserves data privacy by communicating only LMO outputs, which correspond to sparse or low-rank signals, and supports practical enhancements such as stochastic gradients, partial participation, straggler-aware constraints, and an augmented Lagrangian version (FedFW+). Empirical results on convex MCLR and non-convex CNN/DNN tasks across IID and non-IID data demonstrate competitive performance with reduced communication, highlighting FedFW’s practicality for scalable, privacy-preserving constrained FL.
Abstract
Federated learning (FL) has gained a lot of attention in recent years for building privacy-preserving collaborative learning systems. However, FL algorithms for constrained machine learning problems are still limited, particularly when the projection step is costly. To this end, we propose a Federated Frank-Wolfe Algorithm (FedFW). FedFW features data privacy, low per-iteration cost, and communication of sparse signals. In the deterministic setting, FedFW achieves an $\varepsilon$-suboptimal solution within $O(\varepsilon^{-2})$ iterations for smooth and convex objectives, and $O(\varepsilon^{-3})$ iterations for smooth but non-convex objectives. Furthermore, we present a stochastic variant of FedFW and show that it finds a solution within $O(\varepsilon^{-3})$ iterations in the convex setting. We demonstrate the empirical performance of FedFW on several machine learning tasks.
