Table of Contents
Fetching ...

A Closer Look at Personalized Fine-Tuning in Heterogeneous Federated Learning

Minghui Chen, Hrad Ghoukasian, Ruinan Jin, Zehua Wang, Sai Praneeth Karimireddy, Xiaoxiao Li

TL;DR

The paper tackles the challenge of balancing client personalization and global generalization in heterogeneous federated learning by adapting the LP-FT (Linear Probing followed by Fine-Tuning) strategy to post-hoc personalization (PFT). Through large-scale empirical studies across seven datasets and multiple PFT baselines, it demonstrates that LP-FT reduces federated feature distortion and achieves superior global and average performance compared to standard fine-tuning methods. The authors provide a theoretical framework using a two-layer linear model to show that LP-FT yields lower global loss under concept shift and under combined covariate-concept shifts, with a threshold on heterogeneity below which LP-FT dominates. Supplementary experiments corroborate the theory, including label-shift and distortion analyses, highlighting LP-FT as a robust, deployment-friendly solution for robust personalization in FL.

Abstract

Federated Learning (FL) enables decentralized, privacy-preserving model training but struggles to balance global generalization and local personalization due to non-identical data distributions across clients. Personalized Fine-Tuning (PFT), a popular post-hoc solution, fine-tunes the final global model locally but often overfits to skewed client distributions or fails under domain shifts. We propose adapting Linear Probing followed by full Fine-Tuning (LP-FT), a principled centralized strategy for alleviating feature distortion (Kumar et al., 2022), to the FL setting. Through systematic evaluation across seven datasets and six PFT variants, we demonstrate LP-FT's superiority in balancing personalization and generalization. Our analysis uncovers federated feature distortion, a phenomenon where local fine-tuning destabilizes globally learned features, and theoretically characterizes how LP-FT mitigates this via phased parameter updates. We further establish conditions (e.g., partial feature overlap, covariate-concept shift) under which LP-FT outperforms standard fine-tuning, offering actionable guidelines for deploying robust personalization in FL.

A Closer Look at Personalized Fine-Tuning in Heterogeneous Federated Learning

TL;DR

The paper tackles the challenge of balancing client personalization and global generalization in heterogeneous federated learning by adapting the LP-FT (Linear Probing followed by Fine-Tuning) strategy to post-hoc personalization (PFT). Through large-scale empirical studies across seven datasets and multiple PFT baselines, it demonstrates that LP-FT reduces federated feature distortion and achieves superior global and average performance compared to standard fine-tuning methods. The authors provide a theoretical framework using a two-layer linear model to show that LP-FT yields lower global loss under concept shift and under combined covariate-concept shifts, with a threshold on heterogeneity below which LP-FT dominates. Supplementary experiments corroborate the theory, including label-shift and distortion analyses, highlighting LP-FT as a robust, deployment-friendly solution for robust personalization in FL.

Abstract

Federated Learning (FL) enables decentralized, privacy-preserving model training but struggles to balance global generalization and local personalization due to non-identical data distributions across clients. Personalized Fine-Tuning (PFT), a popular post-hoc solution, fine-tunes the final global model locally but often overfits to skewed client distributions or fails under domain shifts. We propose adapting Linear Probing followed by full Fine-Tuning (LP-FT), a principled centralized strategy for alleviating feature distortion (Kumar et al., 2022), to the FL setting. Through systematic evaluation across seven datasets and six PFT variants, we demonstrate LP-FT's superiority in balancing personalization and generalization. Our analysis uncovers federated feature distortion, a phenomenon where local fine-tuning destabilizes globally learned features, and theoretically characterizes how LP-FT mitigates this via phased parameter updates. We further establish conditions (e.g., partial feature overlap, covariate-concept shift) under which LP-FT outperforms standard fine-tuning, offering actionable guidelines for deploying robust personalization in FL.

Paper Structure

This paper contains 32 sections, 3 theorems, 53 equations, 6 figures, 7 tables.

Key Result

Lemma 4.3

Under Assumptions assump:data_model and assump:model_structure, and assuming that $\mathbb{E}_{x \sim \mathcal{D}_i}[x x^T] = I_d$ for all clients $i \in [C]$, let the initial parameters before starting FT be $B_0 = B_*$ and $V_0 = ^T$. Assume fine-tuning is performed locally on the data of the $i$- where $(b_j^*)^T$ is the $j$-th row of $B_*$ , and $(V_{0})_j$ is the $j$-th element of $V_{0}$ for

Figures (6)

  • Figure 1: Overview of the problem setting and FL strategies investigated in this paper. (a) PFT framework, where each client fine-tunes a global model trained via GFL (e.g., FedAvg in this paper). Unlike process-integrated PFL, PFT focuses solely on the final fine-tuning stage with no further communication. (b) Three different FL models: the global FL model, the full-parameter FT (full FT) model, and the LP-FT model; their parameter updating patterns and local/global performance (perf.) under data heterogeneity; The fire icon indicates the actively tuned parameter, the frozen icon represents the fixed weight, and the mixed fire-frozen icon denotes the weight that is not actively tuned. (c) Visualization of feature distortion under PFL and its possible link to global generalization.
  • Figure 2: Visualization of the prevalence of personalization overfitting across different distribution shift scenarios, where (a) shows the global and local accuracy under different learning rates for full-parameter fine-tune; (b) shows the different sparsity rate for sparse fine-tune; (c) shows the different regularization strength under the proximal fine-tune. In all subfigures, the global accuracy is shown as the solid line, and the local accuracy is shown as the dashed line. As shown, global accuracy consistently declines while local accuracy either increases or remains stable across different hyperparameter settings. This suggests that PFT baseline methods are prone to overfitting, even with careful hyperparameter tuning.
  • Figure 3: Observations of the feature distortion in PFT setting, where (a) presents the positive correlation between global performance drops and feature distortion intensity on DomainNet and (b) presents the ablation study on preserving federated features with controlled local train loss on Digit5. We set local loss thresholds (0.1, 0.5, and 1.0) and used gradient ascent when the loss fell below, ensuring training loss fluctuated around these points.
  • Figure 4: Illustration of federated feature distortion (FD) and decision boundaries.
  • Figure 5: Visualization of the original datasets used in the paper.
  • ...and 1 more figures

Theorems & Definitions (7)

  • Lemma 4.3
  • Theorem 4.4
  • Theorem 4.5
  • Remark 4.6
  • proof : Proof of Lemma \ref{['lem:gradient_ft']}
  • proof : Proof of Theorem \ref{['Thm: concept shift']}
  • proof : Proof of Theorem \ref{['Theorem: Covariate and concept shift']}