Table of Contents
Fetching ...

HEART-PFL: Stable Personalized Federated Learning under Heterogeneity with Hierarchical Directional Alignment and Adversarial Knowledge Transfer

Minjun Kim, Minje Kim

Abstract

Personalized Federated Learning (PFL) aims to deliver effective client-specific models under heterogeneous distributions, yet existing methods suffer from shallow prototype alignment and brittle server-side distillation. We propose HEART-PFL, a dual-sided framework that (i) performs depth-aware Hierarchical Directional Alignment (HDA) using cosine similarity in the early stage and MSE matching in the deep stage to preserve client specificity, and (ii) stabilizes global updates through Adversarial Knowledge Transfer (AKT) with symmetric KL distillation on clean and adversarial proxy data. Using lightweight adapters with only 1.46M trainable parameters, HEART-PFL achieves state-of-the-art personalized accuracy on CIFAR-100, Flowers-102, and Caltech-101 (63.42%, 84.23%, and 95.67%, respectively) under Dirichlet non-IID partitions, and remains robust to out-of-domain proxy data. Ablation studies further confirm that HDA and AKT provide complementary gains in alignment, robustness, and optimization stability, offering insights into how the two components mutually reinforce effective personalization. Overall, these results demonstrate that HEART-PFL simultaneously enhances personalization and global stability, highlighting its potential as a strong and scalable solution for PFL(code available at https://github.com/danny0628/HEART-PFL).

HEART-PFL: Stable Personalized Federated Learning under Heterogeneity with Hierarchical Directional Alignment and Adversarial Knowledge Transfer

Abstract

Personalized Federated Learning (PFL) aims to deliver effective client-specific models under heterogeneous distributions, yet existing methods suffer from shallow prototype alignment and brittle server-side distillation. We propose HEART-PFL, a dual-sided framework that (i) performs depth-aware Hierarchical Directional Alignment (HDA) using cosine similarity in the early stage and MSE matching in the deep stage to preserve client specificity, and (ii) stabilizes global updates through Adversarial Knowledge Transfer (AKT) with symmetric KL distillation on clean and adversarial proxy data. Using lightweight adapters with only 1.46M trainable parameters, HEART-PFL achieves state-of-the-art personalized accuracy on CIFAR-100, Flowers-102, and Caltech-101 (63.42%, 84.23%, and 95.67%, respectively) under Dirichlet non-IID partitions, and remains robust to out-of-domain proxy data. Ablation studies further confirm that HDA and AKT provide complementary gains in alignment, robustness, and optimization stability, offering insights into how the two components mutually reinforce effective personalization. Overall, these results demonstrate that HEART-PFL simultaneously enhances personalization and global stability, highlighting its potential as a strong and scalable solution for PFL(code available at https://github.com/danny0628/HEART-PFL).
Paper Structure (14 sections, 12 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 14 sections, 12 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of our proposed HDA. HDA extracts a hierarchical set of class-wise prototypes from each client's personalized model. It then aligns features from the global model with these client-specific prototypes using a semantic-aware mechanism: cosine similarity for early stage and MSE for deep stage. This alignment mechanism is formulated as our proposed HDA loss, $\mathcal{L}_{\text{HDA}}$.
  • Figure 2: Overview of our proposed AKT. To enhance the robustness of the global adapter, AKT performs knowledge distillation using both clean and adversarially generated proxy samples.
  • Figure 3: Out-of-Domain Setting on CIFAR100 and Caltech101. These experiments were conducted using the methods from our HEART-PFL to measure the out-of-domain performance of AKT.
  • Figure 4: Layer-wise ablation study of HDA. We evaluate five configurations: a baseline using MSE loss (without cosine similarity) and settings with cosine similarity applied progressively from one to all layers. The results show that increasing the depth of directional alignment leads to consistent improvements in test accuracy (blue bars), representation alignment (purple line), and feature norm variance (green dashed line).
  • Figure 5: Personalized and global test performance under component ablations of AKT on CIFAR100 with Dirichlet partitions ($\alpha=0.1$). We compare the AKT (Ours) against variants using only clean samples (Clean), using only adversarial perturbation (Adv), without adversarial perturbation but with symmetric KL (Clean+sKL), without symmetric KL but with adversarial perturbation (Clean+Adv), and without clean samples (Adv+sKL). Across both metrics, the Full AKT configuration achieves the best accuracy.
  • ...and 1 more figures