Table of Contents
Fetching ...

OvA-LP: A Simple and Efficient Framework for Federated Learning on Non-IID Data

Dongjin Park, Hasung Yeo, Joon-Woo Lee

TL;DR

This work tackles the robustness gap in federated fine-tuning under non-IID client distributions by addressing drift at its source. It introduces OvA-LP, a minimalist framework that freezes the encoder, applies linear probing, and employs a two-stage one-vs-all head to decouple logits and control feature and label skew, all within a bias–variance perspective. Empirical results on CIFAR-100 with 100 clients show near-IID performance (e.g., ~95.9% relative to IID) and strong resilience to label noise, while achieving markedly lower computation and communication costs than post-hoc baselines like FFT-MoE and PFPT. The approach provides a principled, modular baseline that can complement existing aggregation or personalization techniques to enable robust FFT in highly heterogeneous environments.

Abstract

Federated fine-tuning (FFT) adapts foundation models to decentralized data but remains fragile under heterogeneous client distributions due to local drift, i.e., client-level update divergences that induce systematic bias and amplified variance in the global model. Existing aggregation and personalization methods largely correct drift post hoc, which proves brittle under extreme non-IID conditions. We introduce OvA-LP, a minimalist framework that is, to our knowledge, the first explicitly designed to suppress drift at its source within the PEFT-based FFT paradigm. OvA-LP combines linear probing on a frozen encoder with a one-vs-all head and a simple two-stage procedure, preserving pretrained feature geometry and decoupling logits to prevent the mechanisms that amplify drift. On CIFAR-100 with 100 clients, averaged over shard-1, shard-2, and Bernoulli-Dirichlet partitions, OvA-LP retains 95.9% of its IID accuracy, whereas state-of-the-art FFT baselines retain only 10.1% (PFPT) and 34.5% (FFT-MoE) under the same conditions. OvA-LP further maintains resilience under both symmetric and asymmetric label noise. In addition, precomputing encoder features makes per-round cost nearly independent of encoder size. Together, these results demonstrate that OvA-LP provides a principled and efficient basis for robust FFT under heterogeneity.

OvA-LP: A Simple and Efficient Framework for Federated Learning on Non-IID Data

TL;DR

This work tackles the robustness gap in federated fine-tuning under non-IID client distributions by addressing drift at its source. It introduces OvA-LP, a minimalist framework that freezes the encoder, applies linear probing, and employs a two-stage one-vs-all head to decouple logits and control feature and label skew, all within a bias–variance perspective. Empirical results on CIFAR-100 with 100 clients show near-IID performance (e.g., ~95.9% relative to IID) and strong resilience to label noise, while achieving markedly lower computation and communication costs than post-hoc baselines like FFT-MoE and PFPT. The approach provides a principled, modular baseline that can complement existing aggregation or personalization techniques to enable robust FFT in highly heterogeneous environments.

Abstract

Federated fine-tuning (FFT) adapts foundation models to decentralized data but remains fragile under heterogeneous client distributions due to local drift, i.e., client-level update divergences that induce systematic bias and amplified variance in the global model. Existing aggregation and personalization methods largely correct drift post hoc, which proves brittle under extreme non-IID conditions. We introduce OvA-LP, a minimalist framework that is, to our knowledge, the first explicitly designed to suppress drift at its source within the PEFT-based FFT paradigm. OvA-LP combines linear probing on a frozen encoder with a one-vs-all head and a simple two-stage procedure, preserving pretrained feature geometry and decoupling logits to prevent the mechanisms that amplify drift. On CIFAR-100 with 100 clients, averaged over shard-1, shard-2, and Bernoulli-Dirichlet partitions, OvA-LP retains 95.9% of its IID accuracy, whereas state-of-the-art FFT baselines retain only 10.1% (PFPT) and 34.5% (FFT-MoE) under the same conditions. OvA-LP further maintains resilience under both symmetric and asymmetric label noise. In addition, precomputing encoder features makes per-round cost nearly independent of encoder size. Together, these results demonstrate that OvA-LP provides a principled and efficient basis for robust FFT under heterogeneity.

Paper Structure

This paper contains 44 sections, 4 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Overall structure of OvA-LP. Clients precompute encoder features once (left) and perform two-stage local training with one-vs-all heads (right).
  • Figure 2: Feature geometry of pretrained vs randomly initialized encoders (CIFAR-10, ViT-L/16).
  • Figure 3: Ablation of OvA-LP components. Stepwise gains (56.3 $\rightarrow$ 95.4 $\rightarrow$ 95.9) illustrate the effects of OvA decoupling and two-stage training.
  • Figure 4: Comparison with state-of-the-art baselines. FFT-MoE plateaus near 10.1%, while PFPT rises slowly but saturates at 34.5%. OvA-LP remains stable and converges to 95.9%.
  • Figure 5: Partition-wise robustness of OvA-LP. Across five representative heterogeneity patterns, $R(t)$ curves (left) closely track the IID reference and final $R(50)$ values (right) remain above 94.9%. This confirms consistent robustness across diverse forms of skew, including label, feature, and quantity heterogeneity.
  • ...and 4 more figures