Non-convex composite federated learning with heterogeneous data
Jiaojiao Zhang, Jiang Hu, Mikael Johansson
TL;DR
This work tackles non-convex composite federated learning with heterogeneous data by decoupling the proximal operator evaluation from server–client communication and enabling local updates that reduce communication to a single $d$-dimensional vector per round. Each client maintains pre- and post-proximal local models and uses a drift-correction term to align local updates with the global objective, while the server aggregates via proximal steps to emulate centralized proximal SGD. The authors prove sublinear convergence to a bounded residual in the general non-convex case and linear convergence under the proximal PL inequality, with residuals that depend on gradient variance, batch size, and subgradient bounds. Empirical evaluations on sparse logistic regression and federated CNN training on MNIST show the approach outperforms state-of-the-art methods, particularly under data heterogeneity and with reduced communication rounds. The work advances practical FL for non-smooth, non-convex objectives and offers a solid foundation for future extensions to broader proximal structures and heterogeneous scenarios.
Abstract
We propose an innovative algorithm for non-convex composite federated learning that decouples the proximal operator evaluation and the communication between server and clients. Moreover, each client uses local updates to communicate less frequently with the server, sends only a single d-dimensional vector per communication round, and overcomes issues with client drift. In the analysis, challenges arise from the use of decoupling strategies and local updates in the algorithm, as well as from the non-convex and non-smooth nature of the problem. We establish sublinear and linear convergence to a bounded residual error under general non-convexity and the proximal Polyak-Lojasiewicz inequality, respectively. In the numerical experiments, we demonstrate the superiority of our algorithm over state-of-the-art methods on both synthetic and real datasets.
