Table of Contents
Fetching ...

Tackling Feature-Classifier Mismatch in Federated Learning via Prompt-Driven Feature Transformation

Xinghao Wu, Jianwei Niu, Xuefeng Liu, Mingjia Shi, Guogang Zhu, Shaojie Tang

TL;DR

This work tackles non-IID data in Federated Learning by pinpointing a mismatch between locally extracted features and the global classifier as a key source of FedAvg's underperformance. It introduces FedPFT, a prompt-driven feature transformation that sits between a shared global feature extractor and classifier, guided by per-client prompts and a MoCo-based contrastive objective to jointly align features and enhance the feature extractor. The training employs an alternating two-phase scheme that first learns prompts and refined features, then tunes the classifier, achieving improved cross-client alignment and feature quality. Empirical results across CIFAR-10/100 and Tiny ImageNet show FedPFT surpasses state-of-the-art methods by up to 7.08%, with robust performance under Dirichlet and pathological non-IID settings and clear improvements in feature separability.

Abstract

In traditional Federated Learning approaches like FedAvg, the global model underperforms when faced with data heterogeneity. Personalized Federated Learning (PFL) enables clients to train personalized models to fit their local data distribution better. However, we surprisingly find that the feature extractor in FedAvg is superior to those in most PFL methods. More interestingly, by applying a linear transformation on local features extracted by the feature extractor to align with the classifier, FedAvg can surpass the majority of PFL methods. This suggests that the primary cause of FedAvg's inadequate performance stems from the mismatch between the locally extracted features and the classifier. While current PFL methods mitigate this issue to some extent, their designs compromise the quality of the feature extractor, thus limiting the full potential of PFL. In this paper, we propose a new PFL framework called FedPFT to address the mismatch problem while enhancing the quality of the feature extractor. FedPFT integrates a feature transformation module, driven by personalized prompts, between the global feature extractor and classifier. In each round, clients first train prompts to transform local features to match the global classifier, followed by training model parameters. This approach can also align the training objectives of clients, reducing the impact of data heterogeneity on model collaboration. Moreover, FedPFT's feature transformation module is highly scalable, allowing for the use of different prompts to tailor local features to various tasks. Leveraging this, we introduce a collaborative contrastive learning task to further refine feature extractor quality. Our experiments demonstrate that FedPFT outperforms state-of-the-art methods by up to 7.08%.

Tackling Feature-Classifier Mismatch in Federated Learning via Prompt-Driven Feature Transformation

TL;DR

This work tackles non-IID data in Federated Learning by pinpointing a mismatch between locally extracted features and the global classifier as a key source of FedAvg's underperformance. It introduces FedPFT, a prompt-driven feature transformation that sits between a shared global feature extractor and classifier, guided by per-client prompts and a MoCo-based contrastive objective to jointly align features and enhance the feature extractor. The training employs an alternating two-phase scheme that first learns prompts and refined features, then tunes the classifier, achieving improved cross-client alignment and feature quality. Empirical results across CIFAR-10/100 and Tiny ImageNet show FedPFT surpasses state-of-the-art methods by up to 7.08%, with robust performance under Dirichlet and pathological non-IID settings and clear improvements in feature separability.

Abstract

In traditional Federated Learning approaches like FedAvg, the global model underperforms when faced with data heterogeneity. Personalized Federated Learning (PFL) enables clients to train personalized models to fit their local data distribution better. However, we surprisingly find that the feature extractor in FedAvg is superior to those in most PFL methods. More interestingly, by applying a linear transformation on local features extracted by the feature extractor to align with the classifier, FedAvg can surpass the majority of PFL methods. This suggests that the primary cause of FedAvg's inadequate performance stems from the mismatch between the locally extracted features and the classifier. While current PFL methods mitigate this issue to some extent, their designs compromise the quality of the feature extractor, thus limiting the full potential of PFL. In this paper, we propose a new PFL framework called FedPFT to address the mismatch problem while enhancing the quality of the feature extractor. FedPFT integrates a feature transformation module, driven by personalized prompts, between the global feature extractor and classifier. In each round, clients first train prompts to transform local features to match the global classifier, followed by training model parameters. This approach can also align the training objectives of clients, reducing the impact of data heterogeneity on model collaboration. Moreover, FedPFT's feature transformation module is highly scalable, allowing for the use of different prompts to tailor local features to various tasks. Leveraging this, we introduce a collaborative contrastive learning task to further refine feature extractor quality. Our experiments demonstrate that FedPFT outperforms state-of-the-art methods by up to 7.08%.
Paper Structure (47 sections, 7 theorems, 26 equations, 12 figures, 11 tables, 1 algorithm)

This paper contains 47 sections, 7 theorems, 26 equations, 12 figures, 11 tables, 1 algorithm.

Key Result

Proposition K.1

If $f$ is $L$-smooth, $\forall x,y$ we have:

Figures (12)

  • Figure 1: Overview of FedPFT. (a) The training process of each client $i$ in one communication round. (b) The feature transformation module in FedPFT.
  • Figure 2: t-SNE visualization of features extracted by different methods on the CIFAR-10 dataset.
  • Figure 3: The effect of different prompts on feature space.
  • Figure 4: Visualization of data partitioning in Dirichlet non-IID scenarios with different $\alpha$.
  • Figure 6: Visualize attention weights for different prompts in a client in the CIFAR-10 dataset under the Dirichlet non-IID scenario.
  • ...and 7 more figures

Theorems & Definitions (10)

  • Proposition K.1: $L$-smooth
  • Proposition K.2: Jensen's inequality
  • Proposition K.3: Triangle inequality
  • Proposition K.4: Matrix norm compatibility
  • Proposition K.5: Peter Paul inequality
  • Lemma K.1: Bounded local approximation error
  • proof
  • Theorem K.2: Non-convex and smooth convergence of FedPFT
  • proof
  • Remark K.2.1