Table of Contents
Fetching ...

pFedAFM: Adaptive Feature Mixture for Batch-Level Personalization in Heterogeneous Federated Learning

Liping Yi, Han Yu, Chao Ren, Heng Zhang, Gang Wang, Xiaoguang Liu, Xiaoxiao Li

TL;DR

pFedAFM addresses batch-level heterogeneity in model-heterogeneous personalized FL by introducing a global homogeneous feature extractor $\mathcal{G}(\theta)$ shared across clients and a locally heterogeneous model $\mathcal{F}_k(\omega_k)$. It trains iteratively to enable bidirectional knowledge transfer between global and local components, using per-dimension trainable weights to mix generalized and personalized representations for each batch, ensuring adaptive batch-level personalization. The approach achieves a non-convex convergence rate of $\mathcal{O}(1/T)$ and demonstrates up to $7.93\%$ accuracy gains over seven state-of-the-art baselines on CIFAR-10/100 under both pathological and practical non-IID settings, with lower communication and computation costs. These results suggest pFedAFM provides robust, scalable personalization in heterogeneous FL, balancing global generalization and local specialization without requiring public data.

Abstract

Model-heterogeneous personalized federated learning (MHPFL) enables FL clients to train structurally different personalized models on non-independent and identically distributed (non-IID) local data. Existing MHPFL methods focus on achieving client-level personalization, but cannot address batch-level data heterogeneity. To bridge this important gap, we propose a model-heterogeneous personalized Federated learning approach with Adaptive Feature Mixture (pFedAFM) for supervised learning tasks. It consists of three novel designs: 1) A sharing global homogeneous small feature extractor is assigned alongside each client's local heterogeneous model (consisting of a heterogeneous feature extractor and a prediction header) to facilitate cross-client knowledge fusion. The two feature extractors share the local heterogeneous model's prediction header containing rich personalized prediction knowledge to retain personalized prediction capabilities. 2) An iterative training strategy is designed to alternately train the global homogeneous small feature extractor and the local heterogeneous large model for effective global-local knowledge exchange. 3) A trainable weight vector is designed to dynamically mix the features extracted by both feature extractors to adapt to batch-level data heterogeneity. Theoretical analysis proves that pFedAFM can converge over time. Extensive experiments on 2 benchmark datasets demonstrate that it significantly outperforms 7 state-of-the-art MHPFL methods, achieving up to 7.93% accuracy improvement while incurring low communication and computation costs.

pFedAFM: Adaptive Feature Mixture for Batch-Level Personalization in Heterogeneous Federated Learning

TL;DR

pFedAFM addresses batch-level heterogeneity in model-heterogeneous personalized FL by introducing a global homogeneous feature extractor shared across clients and a locally heterogeneous model . It trains iteratively to enable bidirectional knowledge transfer between global and local components, using per-dimension trainable weights to mix generalized and personalized representations for each batch, ensuring adaptive batch-level personalization. The approach achieves a non-convex convergence rate of and demonstrates up to accuracy gains over seven state-of-the-art baselines on CIFAR-10/100 under both pathological and practical non-IID settings, with lower communication and computation costs. These results suggest pFedAFM provides robust, scalable personalization in heterogeneous FL, balancing global generalization and local specialization without requiring public data.

Abstract

Model-heterogeneous personalized federated learning (MHPFL) enables FL clients to train structurally different personalized models on non-independent and identically distributed (non-IID) local data. Existing MHPFL methods focus on achieving client-level personalization, but cannot address batch-level data heterogeneity. To bridge this important gap, we propose a model-heterogeneous personalized Federated learning approach with Adaptive Feature Mixture (pFedAFM) for supervised learning tasks. It consists of three novel designs: 1) A sharing global homogeneous small feature extractor is assigned alongside each client's local heterogeneous model (consisting of a heterogeneous feature extractor and a prediction header) to facilitate cross-client knowledge fusion. The two feature extractors share the local heterogeneous model's prediction header containing rich personalized prediction knowledge to retain personalized prediction capabilities. 2) An iterative training strategy is designed to alternately train the global homogeneous small feature extractor and the local heterogeneous large model for effective global-local knowledge exchange. 3) A trainable weight vector is designed to dynamically mix the features extracted by both feature extractors to adapt to batch-level data heterogeneity. Theoretical analysis proves that pFedAFM can converge over time. Extensive experiments on 2 benchmark datasets demonstrate that it significantly outperforms 7 state-of-the-art MHPFL methods, achieving up to 7.93% accuracy improvement while incurring low communication and computation costs.
Paper Structure (29 sections, 4 theorems, 34 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 29 sections, 4 theorems, 34 equations, 9 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

Local Training. Given Assumptions assump:Lipschitz and assump:Unbiased, the loss of an arbitrary client's local model $h$ in the $(t+1)$-th local training round is bounded by:

Figures (9)

  • Figure 1: Feature extractor and prediction header.
  • Figure 2: Workflow of pFedAFM.
  • Figure 3: Average accuracy varies as rounds. Standalone or FedProto is the best baseline in each setting of Table \ref{['tab:compare-hetero']}.
  • Figure 4: Accuracy variance of individual clients.
  • Figure 5: Rounds, communication, and computation for target mean accuracy $90\%$ on CIFAR-10 and $50\%$ on CIFAR-100.
  • ...and 4 more figures

Theorems & Definitions (4)

  • Lemma 1
  • Lemma 2
  • Theorem 1
  • Theorem 2