FedDr+: Stabilizing Dot-regression with Global Feature Distillation for Federated Learning

Seongyoon Kim; Minchan Jeong; Sungnyun Kim; Sungwoo Cho; Sumyeong Ahn; Se-Young Yun

FedDr+: Stabilizing Dot-regression with Global Feature Distillation for Federated Learning

Seongyoon Kim, Minchan Jeong, Sungnyun Kim, Sungwoo Cho, Sumyeong Ahn, Se-Young Yun

TL;DR

Federated learning under non-IID data suffers from client drift, especially when the last classifier layer is misaligned with heterogeneous feature extractors. The paper investigates dot-regression loss ($\mathcal{L}_{DR}$) with a frozen simplex ETF classifier, finding strong local alignment but poor handling of unseen classes, which hurts the global model. To address this, FedDr+ introduces a feature distillation loss ($\mathcal{L}_{FD}$) and forms $\mathcal{L}_{Dr+} = \beta \mathcal{L}_{DR} + (1-\beta) \mathcal{L}_{FD}$, preserving global knowledge while maintaining alignment. Empirical results on CIFAR-10/100 demonstrate that FedDr+ achieves superior performance in both global and personalized FL across diverse non-IID settings, validating the approach and highlighting its robustness and practical impact for real-world FL deployments. The method advances the stability and generalization of FL by preventing forgetting of unseen classes during local updates while retaining fast feature alignment.

Abstract

Federated Learning (FL) has emerged as a pivotal framework for the development of effective global models (global FL) or personalized models (personalized FL) across clients with heterogeneous, non-iid data distribution. A key challenge in FL is client drift, where data heterogeneity impedes the aggregation of scattered knowledge. Recent studies have tackled the client drift issue by identifying significant divergence in the last classifier layer. To mitigate this divergence, strategies such as freezing the classifier weights and aligning the feature extractor accordingly have proven effective. Although the local alignment between classifier and feature extractor has been studied as a crucial factor in FL, we observe that it may lead the model to overemphasize the observed classes within each client. Thus, our objectives are twofold: (1) enhancing local alignment while (2) preserving the representation of unseen class samples. This approach aims to effectively integrate knowledge from individual clients, thereby improving performance for both global and personalized FL. To achieve this, we introduce a novel algorithm named FedDr+, which empowers local model alignment using dot-regression loss. FedDr+ freezes the classifier as a simplex ETF to align the features and improves aggregated global models by employing a feature distillation mechanism to retain information about unseen/missing classes. Consequently, we provide empirical evidence demonstrating that our algorithm surpasses existing methods that use a frozen classifier to boost alignment across the diverse distribution.

FedDr+: Stabilizing Dot-regression with Global Feature Distillation for Federated Learning

TL;DR

) with a frozen simplex ETF classifier, finding strong local alignment but poor handling of unseen classes, which hurts the global model. To address this, FedDr+ introduces a feature distillation loss (

) and forms

, preserving global knowledge while maintaining alignment. Empirical results on CIFAR-10/100 demonstrate that FedDr+ achieves superior performance in both global and personalized FL across diverse non-IID settings, validating the approach and highlighting its robustness and practical impact for real-world FL deployments. The method advances the stability and generalization of FL by preventing forgetting of unseen classes during local updates while retaining fast feature alignment.

Abstract

Paper Structure (31 sections, 3 theorems, 7 equations, 6 figures, 9 tables)

This paper contains 31 sections, 3 theorems, 7 equations, 6 figures, 9 tables.

Introduction
Preliminaries
Basic Setup of Conventional FedAvg Pipeline
Dot-Regression Loss for Faster Feature Alignment
When Dot-Regression Loss Meets Federated Learning
Impact of Dot-Regression Loss on Local and Global Models
FedDr+: Dot-Regression and Feature Distillation for Federated Learning
Effect of Feature Distillation
Synergistic Effect with Different Types of FL Algorithms and Regularizers
Experiments and Results
Experimental Setup
Global Federated Learning Results
Personalized Federated Learning Results
Sensitivity Analysis
Related Work
...and 16 more sections

Key Result

Lemma 1

For all $c,c'\in[C], \frac{\partial p_{c'}(x;\bm{\theta})}{\partial z_c(x;\bm{\theta})}= .$

Figures (6)

Figure 1: Overview of the proposed method, FedDr+ trained with $\mathcal{L}_{\text{Dr+}}$. To enhance the local alignment, we employ dot-regression loss $\mathcal{L}_{\text{DR}}$, which discards the pushing term of cross-entropy loss, and propose a feature distillation $\mathcal{L}_{\text{FD}}$ to preserve the knowledge imbued in the global model. We describe $\mathcal{L}_{\text{DR}}$ in \ref{['sec:pre']}, and feature distillation in \ref{['sec:prob']} in detail.
Figure 2: Comparison of (a) feature-classifier alignment and (b) accuracy on the observed and unobserved classes test data for $\bm{\theta}_r^i$ trained with $\mathcal{L}_\text{CE}$ and $\mathcal{L}_\text{DR}$.
Figure 3: Comparison of (a) feature-classifier alignment gap and (b) accuracy gap on the observed and unobserved classes test data for $\bm{\theta}_r^i$ trained with $\mathcal{L}_\text{CE}$ and $\mathcal{L}_\text{DR}$.
Figure 4: We present (a) feature distance, (b) feature angle distance, (c) and feature norm difference from $\bm{\theta}_{r-1}^g$ to $\bm{\theta}_{r}^i$ for observed and unobserved classes by training with $\mathcal{L}_\text{DR}$ and $\mathcal{L}_\text{Dr+}$.
Figure 5: Comparison of alignment/accuracy on the observed and unobserved classes test data for $\bm{\theta}_r^i$ trained with $\mathcal{L}_\text{DR}$ and $\mathcal{L}_\text{Dr+}$.
...and 1 more figures

Theorems & Definitions (6)

Lemma 1
proof
Lemma 2
proof
Proposition 1
proof

FedDr+: Stabilizing Dot-regression with Global Feature Distillation for Federated Learning

TL;DR

Abstract

FedDr+: Stabilizing Dot-regression with Global Feature Distillation for Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (6)