Table of Contents
Fetching ...

Federated Representation Learning in the Under-Parameterized Regime

Renpu Liu, Cong Shen, Jing Yang

TL;DR

The paper addresses federated representation learning when the shared representation is insufficient to express all clients’ ground-truth models. It introduces FLUTE, a novel algorithm that regularizes and aggregates on the server to distill the top-$k$ global subspace while learning client heads, providing theoretical guarantees for linear models and extending to non-linear settings. Key contributions include a new loss with subspace-preserving regularizers, explicit per-client sample complexity bounds showing near-linear speedups in $M$, and exponential convergence under adequate data, plus empirical evidence on synthetic data and real datasets like CIFAR-10/100. The work has practical impact for resource-constrained FL deployments where model under-parameterization is inevitable, offering a principled pathway to efficient, globally coherent representations.

Abstract

Federated representation learning (FRL) is a popular personalized federated learning (FL) framework where clients work together to train a common representation while retaining their personalized heads. Existing studies, however, largely focus on the over-parameterized regime. In this paper, we make the initial efforts to investigate FRL in the under-parameterized regime, where the FL model is insufficient to express the variations in all ground-truth models. We propose a novel FRL algorithm FLUTE, and theoretically characterize its sample complexity and convergence rate for linear models in the under-parameterized regime. To the best of our knowledge, this is the first FRL algorithm with provable performance guarantees in this regime. FLUTE features a data-independent random initialization and a carefully designed objective function that aids the distillation of subspace spanned by the global optimal representation from the misaligned local representations. On the technical side, we bridge low-rank matrix approximation techniques with the FL analysis, which may be of broad interest. We also extend FLUTE beyond linear representations. Experimental results demonstrate that FLUTE outperforms state-of-the-art FRL solutions in both synthetic and real-world tasks.

Federated Representation Learning in the Under-Parameterized Regime

TL;DR

The paper addresses federated representation learning when the shared representation is insufficient to express all clients’ ground-truth models. It introduces FLUTE, a novel algorithm that regularizes and aggregates on the server to distill the top- global subspace while learning client heads, providing theoretical guarantees for linear models and extending to non-linear settings. Key contributions include a new loss with subspace-preserving regularizers, explicit per-client sample complexity bounds showing near-linear speedups in , and exponential convergence under adequate data, plus empirical evidence on synthetic data and real datasets like CIFAR-10/100. The work has practical impact for resource-constrained FL deployments where model under-parameterization is inevitable, offering a principled pathway to efficient, globally coherent representations.

Abstract

Federated representation learning (FRL) is a popular personalized federated learning (FL) framework where clients work together to train a common representation while retaining their personalized heads. Existing studies, however, largely focus on the over-parameterized regime. In this paper, we make the initial efforts to investigate FRL in the under-parameterized regime, where the FL model is insufficient to express the variations in all ground-truth models. We propose a novel FRL algorithm FLUTE, and theoretically characterize its sample complexity and convergence rate for linear models in the under-parameterized regime. To the best of our knowledge, this is the first FRL algorithm with provable performance guarantees in this regime. FLUTE features a data-independent random initialization and a carefully designed objective function that aids the distillation of subspace spanned by the global optimal representation from the misaligned local representations. On the technical side, we bridge low-rank matrix approximation techniques with the FL analysis, which may be of broad interest. We also extend FLUTE beyond linear representations. Experimental results demonstrate that FLUTE outperforms state-of-the-art FRL solutions in both synthetic and real-world tasks.
Paper Structure (30 sections, 21 theorems, 134 equations, 12 figures, 1 table, 2 algorithms)

This paper contains 30 sections, 21 theorems, 134 equations, 12 figures, 1 table, 2 algorithms.

Key Result

Theorem 5.1

Set $\gamma_1 = \frac{1}{4}$ and $\gamma_2 = \frac{1}{8}$ in def:AMA-emp. Let $0<\alpha \lesssim \frac{1}{10d}$, and $\eta:=\eta_l=\eta_r\lesssim \frac{\Delta^2}{228{\lambda_1}^3}$. Then, for any $\epsilon>0$ and $0\leq\delta\leq 1$, under alg:fedUP, there exists positive constants $c$ and $c'$ such and $t\geq \frac{\log(\epsilon\sqrt{M}\eta\Delta^2/{c'\lambda_1^2\sqrt{k}})}{\log(1-\eta\Delta/{16}

Figures (12)

  • Figure 1: Experimental results with synthetic datasets.
  • Figure 2: Behavior of locally optimized heads and globally optimized heads.
  • Figure 3: Experimental results with synthetic datasets.
  • Figure 4: Experimental results with synthetic datasets.
  • Figure 5: Experimental results for CIFAR10 when $M=50, m'=2$.
  • ...and 7 more figures

Theorems & Definitions (45)

  • Definition 3.1: Under-Parameterization in FRL
  • Example 1
  • Remark 4.1
  • Theorem 5.1: Sample complexity
  • Remark 5.2
  • Remark 5.3
  • Remark 5.4
  • Theorem 5.5: Convergence rate
  • Remark 5.6
  • Remark 5.7
  • ...and 35 more