Communication-Efficient Federated Bilevel Optimization with Local and Global Lower Level Problems
Junyi Li, Feihu Huang, Heng Huang
TL;DR
The paper tackles Federated Bilevel Optimization where both the upper and lower problems are distributed across clients with heterogeneity. It introduces FedBiOAcc, which recasts hyper-gradient estimation as a three-intertwined, distributed quadratic problem and applies momentum-based variance reduction to achieve $O(\epsilon^{-1})$ communication and $O(\epsilon^{-1.5})$ sample complexity, with linear speedup in the number of clients $M$. It also analyzes a local-lower-level variant, FedBiOAcc-Local, which attains the same $O(\epsilon^{-1.5})$ iteration rate but without the linear $M$-speedup, and validates the methods on Federated Data Cleaning and Federated Hyper-representation Learning, where they demonstrate superior performance and robustness. Overall, the work advances scalable, communication-efficient bilevel optimization in federated settings by combining a quadratic hyper-gradient formulation with STORM-style variance reduction and careful handling of heterogeneity.
Abstract
Bilevel Optimization has witnessed notable progress recently with new emerging efficient algorithms. However, its application in the Federated Learning setting remains relatively underexplored, and the impact of Federated Learning's inherent challenges on the convergence of bilevel algorithms remain obscure. In this work, we investigate Federated Bilevel Optimization problems and propose a communication-efficient algorithm, named FedBiOAcc. The algorithm leverages an efficient estimation of the hyper-gradient in the distributed setting and utilizes the momentum-based variance-reduction acceleration. Remarkably, FedBiOAcc achieves a communication complexity $O(ε^{-1})$, a sample complexity $O(ε^{-1.5})$ and the linear speed up with respect to the number of clients. We also analyze a special case of the Federated Bilevel Optimization problems, where lower level problems are locally managed by clients. We prove that FedBiOAcc-Local, a modified version of FedBiOAcc, converges at the same rate for this type of problems. Finally, we validate the proposed algorithms through two real-world tasks: Federated Data-cleaning and Federated Hyper-representation Learning. Empirical results show superior performance of our algorithms.
