Communication-Efficient Federated Bilevel Optimization with Local and Global Lower Level Problems

Junyi Li; Feihu Huang; Heng Huang

Communication-Efficient Federated Bilevel Optimization with Local and Global Lower Level Problems

Junyi Li, Feihu Huang, Heng Huang

TL;DR

The paper tackles Federated Bilevel Optimization where both the upper and lower problems are distributed across clients with heterogeneity. It introduces FedBiOAcc, which recasts hyper-gradient estimation as a three-intertwined, distributed quadratic problem and applies momentum-based variance reduction to achieve $O(\epsilon^{-1})$ communication and $O(\epsilon^{-1.5})$ sample complexity, with linear speedup in the number of clients $M$. It also analyzes a local-lower-level variant, FedBiOAcc-Local, which attains the same $O(\epsilon^{-1.5})$ iteration rate but without the linear $M$-speedup, and validates the methods on Federated Data Cleaning and Federated Hyper-representation Learning, where they demonstrate superior performance and robustness. Overall, the work advances scalable, communication-efficient bilevel optimization in federated settings by combining a quadratic hyper-gradient formulation with STORM-style variance reduction and careful handling of heterogeneity.

Abstract

Bilevel Optimization has witnessed notable progress recently with new emerging efficient algorithms. However, its application in the Federated Learning setting remains relatively underexplored, and the impact of Federated Learning's inherent challenges on the convergence of bilevel algorithms remain obscure. In this work, we investigate Federated Bilevel Optimization problems and propose a communication-efficient algorithm, named FedBiOAcc. The algorithm leverages an efficient estimation of the hyper-gradient in the distributed setting and utilizes the momentum-based variance-reduction acceleration. Remarkably, FedBiOAcc achieves a communication complexity $O(ε^{-1})$, a sample complexity $O(ε^{-1.5})$ and the linear speed up with respect to the number of clients. We also analyze a special case of the Federated Bilevel Optimization problems, where lower level problems are locally managed by clients. We prove that FedBiOAcc-Local, a modified version of FedBiOAcc, converges at the same rate for this type of problems. Finally, we validate the proposed algorithms through two real-world tasks: Federated Data-cleaning and Federated Hyper-representation Learning. Empirical results show superior performance of our algorithms.

Communication-Efficient Federated Bilevel Optimization with Local and Global Lower Level Problems

TL;DR

communication and

sample complexity, with linear speedup in the number of clients

. It also analyzes a local-lower-level variant, FedBiOAcc-Local, which attains the same

iteration rate but without the linear

-speedup, and validates the methods on Federated Data Cleaning and Federated Hyper-representation Learning, where they demonstrate superior performance and robustness. Overall, the work advances scalable, communication-efficient bilevel optimization in federated settings by combining a quadratic hyper-gradient formulation with STORM-style variance reduction and careful handling of heterogeneity.

Abstract

, a sample complexity

and the linear speed up with respect to the number of clients. We also analyze a special case of the Federated Bilevel Optimization problems, where lower level problems are locally managed by clients. We prove that FedBiOAcc-Local, a modified version of FedBiOAcc, converges at the same rate for this type of problems. Finally, we validate the proposed algorithms through two real-world tasks: Federated Data-cleaning and Federated Hyper-representation Learning. Empirical results show superior performance of our algorithms.

Paper Structure (40 sections, 48 theorems, 288 equations, 6 figures, 1 table, 4 algorithms)

This paper contains 40 sections, 48 theorems, 288 equations, 6 figures, 1 table, 4 algorithms.

Introduction
Related Works
Federated Bilevel Optimization
Some Mild Assumptions
The FedBiOAcc Algorithm
Convergence Analysis
Federated Bilevel Optimization with Local Lower Level Problems
Numerical Experiments
Federated Data Cleaning
Federated Hyper-Representation Learning
Conclusion
Assumptions
More Experimental Details and Results
Federated Data Cleaning
Federated Hyper-Representation Learning
...and 25 more sections

Key Result

Theorem 3.6

Suppose in Algorithm alg:FedBiOAcc, we choose learning rate $\alpha_t = \frac{\delta}{(u + t)^{1/3}}, t \in [T]$, for some constant $\delta$ and $u$, and let $c_{\nu}$, $c_{\omega}$, $c_{u}$ choose some value, $\eta$, $\gamma$ and $\tau$, $r$ be some small values decided by the Lipschitz constants o To reach an $\epsilon$-stationary point, we need $T = O(\kappa^{8}(bM)^{-1}\epsilon^{-1.5})$, $I =

Figures (6)

Figure 1: Validation Error vs Communication Rounds. From Left to Right: $\rho=0.1, 0.4, 0.8, 0.95$. The local step $I$ is set as 5 for FedBiO, FedBiOAcc and FedAvg.
Figure 2: Validation Error vs Communication Rounds with different number of clients per epoch. From Left to Right: $\rho=0.1, 0.4, 0.8, 0.95$. The local step $I$ is set as 5.
Figure 3: Validation Error vs Communication Rounds with different number of local steps $I$. From Left to Right: $\rho=0.1, 0.4, 0.8, 0.95$.
Figure 4: Validation Error vs Communication Rounds. The top row shows the result for the Omniglot Dataset and the bottom row shows MiniImageNet. From Left to Right: 5-way-1-shot, 5-way-5-shot, 20-way-1-shot, 20-way-5-shot. The local step $I$ is set to 5.
Figure 5: Results for the Omniglot Dataset. From Left to Right: 5-way-1-shot, 5-way-5-shot, 20-way-1-shot, 20-way-5-shot.
...and 1 more figures

Theorems & Definitions (85)

Theorem 3.6
Proposition C.1
Lemma C.2
proof
Lemma C.3
proof
Lemma C.4
proof
Lemma C.5
proof
...and 75 more

Communication-Efficient Federated Bilevel Optimization with Local and Global Lower Level Problems

TL;DR

Abstract

Communication-Efficient Federated Bilevel Optimization with Local and Global Lower Level Problems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (85)