Federated Representation Learning in the Under-Parameterized Regime

Renpu Liu; Cong Shen; Jing Yang

Federated Representation Learning in the Under-Parameterized Regime

Renpu Liu, Cong Shen, Jing Yang

TL;DR

The paper addresses federated representation learning when the shared representation is insufficient to express all clients’ ground-truth models. It introduces FLUTE, a novel algorithm that regularizes and aggregates on the server to distill the top-$k$ global subspace while learning client heads, providing theoretical guarantees for linear models and extending to non-linear settings. Key contributions include a new loss with subspace-preserving regularizers, explicit per-client sample complexity bounds showing near-linear speedups in $M$, and exponential convergence under adequate data, plus empirical evidence on synthetic data and real datasets like CIFAR-10/100. The work has practical impact for resource-constrained FL deployments where model under-parameterization is inevitable, offering a principled pathway to efficient, globally coherent representations.

Abstract

Federated representation learning (FRL) is a popular personalized federated learning (FL) framework where clients work together to train a common representation while retaining their personalized heads. Existing studies, however, largely focus on the over-parameterized regime. In this paper, we make the initial efforts to investigate FRL in the under-parameterized regime, where the FL model is insufficient to express the variations in all ground-truth models. We propose a novel FRL algorithm FLUTE, and theoretically characterize its sample complexity and convergence rate for linear models in the under-parameterized regime. To the best of our knowledge, this is the first FRL algorithm with provable performance guarantees in this regime. FLUTE features a data-independent random initialization and a carefully designed objective function that aids the distillation of subspace spanned by the global optimal representation from the misaligned local representations. On the technical side, we bridge low-rank matrix approximation techniques with the FL analysis, which may be of broad interest. We also extend FLUTE beyond linear representations. Experimental results demonstrate that FLUTE outperforms state-of-the-art FRL solutions in both synthetic and real-world tasks.

Federated Representation Learning in the Under-Parameterized Regime

TL;DR

global subspace while learning client heads, providing theoretical guarantees for linear models and extending to non-linear settings. Key contributions include a new loss with subspace-preserving regularizers, explicit per-client sample complexity bounds showing near-linear speedups in

, and exponential convergence under adequate data, plus empirical evidence on synthetic data and real datasets like CIFAR-10/100. The work has practical impact for resource-constrained FL deployments where model under-parameterization is inevitable, offering a principled pathway to efficient, globally coherent representations.

Abstract

Paper Structure (30 sections, 21 theorems, 134 equations, 12 figures, 1 table, 2 algorithms)

This paper contains 30 sections, 21 theorems, 134 equations, 12 figures, 1 table, 2 algorithms.

Introduction
Related Work
Problem Formulation
The FLUTE Algorithm
Challenges
A New Loss Function
FLUTE for Linear Model
Theoretical Guarantees
Main Results
Proof Sketch
General FLUTE
Experimental Results
Synthetic Datasets
Real World Datasets
Conclusion
...and 15 more sections

Key Result

Theorem 5.1

Set $\gamma_1 = \frac{1}{4}$ and $\gamma_2 = \frac{1}{8}$ in def:AMA-emp. Let $0<\alpha \lesssim \frac{1}{10d}$, and $\eta:=\eta_l=\eta_r\lesssim \frac{\Delta^2}{228{\lambda_1}^3}$. Then, for any $\epsilon>0$ and $0\leq\delta\leq 1$, under alg:fedUP, there exists positive constants $c$ and $c'$ such and $t\geq \frac{\log(\epsilon\sqrt{M}\eta\Delta^2/{c'\lambda_1^2\sqrt{k}})}{\log(1-\eta\Delta/{16}

Figures (12)

Figure 1: Experimental results with synthetic datasets.
Figure 2: Behavior of locally optimized heads and globally optimized heads.
Figure 3: Experimental results with synthetic datasets.
Figure 4: Experimental results with synthetic datasets.
Figure 5: Experimental results for CIFAR10 when $M=50, m'=2$.
...and 7 more figures

Theorems & Definitions (45)

Definition 3.1: Under-Parameterization in FRL
Example 1
Remark 4.1
Theorem 5.1: Sample complexity
Remark 5.2
Remark 5.3
Remark 5.4
Theorem 5.5: Convergence rate
Remark 5.6
Remark 5.7
...and 35 more

Federated Representation Learning in the Under-Parameterized Regime

TL;DR

Abstract

Federated Representation Learning in the Under-Parameterized Regime

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (45)