Table of Contents
Fetching ...

Federated Progressive Self-Distillation with Logits Calibration for Personalized IIoT Edge Intelligence

Yingchao Wang, Wenqi Niu

TL;DR

The paper addresses the dual forgetting challenge in personalized federated learning for non-IID IIoT edge data, where global knowledge forgets and historical personalized knowledge degrade during local updates. It introduces FedPSD, a client-side framework that couples logits calibration with progressive self-distillation to gradually preserve global generalization while recalling historical personalized knowledge; the approach uses a calibrated fusion label $H_k^{t-1} = \alpha P_k^{t-1} + (1-\alpha) Y_k$ with $\alpha = t/t_{\text{total}}$ and a KL-based distillation loss $\mathcal{L}_{KD} = \mathrm{KL}(H_k^{t,e-1} \| P_k^{t,e})$, together with a calibrated cross-entropy loss $\mathcal{L}_{CE}$ derived from $P_{\text{calibrated}}(y|x)$ that accounts for class priors $P(y)$. Experiments on MNIST, CIFAR-10, and CIFAR-100 under pathological sharding and Dirichlet partitions show FedPSD consistently improves both client and server accuracy and reduces communication rounds to reach target performance, with ablations confirming the contribution of each component. The results indicate FedPSD is a practical, low-overhead solution for robust, personalized IIoT edge intelligence in real-world non-IID FL settings. Overall, the method advances personalized FL by effectively integrating global and local knowledge on the client side, and it demonstrates strong potential for deployment in resource-constrained IIoT environments.

Abstract

Personalized Federated Learning (PFL) focuses on tailoring models to individual IIoT clients in federated learning by addressing data heterogeneity and diverse user needs. Although existing studies have proposed effective PFL solutions from various perspectives, they overlook the issue of forgetting both historical personalized knowledge and global generalized knowledge during local training on clients. Therefore, this study proposes a novel PFL method, Federated Progressive Self-Distillation (FedPSD), based on logits calibration and progressive self-distillation. We analyze the impact mechanism of client data distribution characteristics on personalized and global knowledge forgetting. To address the issue of global knowledge forgetting, we propose a logits calibration approach for the local training loss and design a progressive self-distillation strategy to facilitate the gradual inheritance of global knowledge, where the model outputs from the previous epoch serve as virtual teachers to guide the training of subsequent epochs. Moreover, to address personalized knowledge forgetting, we construct calibrated fusion labels by integrating historical personalized model outputs, which are then used as teacher model outputs to guide the initial epoch of local self-distillation, enabling rapid recall of personalized knowledge. Extensive experiments under various data heterogeneity scenarios demonstrate the effectiveness and superiority of the proposed FedPSD method.

Federated Progressive Self-Distillation with Logits Calibration for Personalized IIoT Edge Intelligence

TL;DR

The paper addresses the dual forgetting challenge in personalized federated learning for non-IID IIoT edge data, where global knowledge forgets and historical personalized knowledge degrade during local updates. It introduces FedPSD, a client-side framework that couples logits calibration with progressive self-distillation to gradually preserve global generalization while recalling historical personalized knowledge; the approach uses a calibrated fusion label with and a KL-based distillation loss , together with a calibrated cross-entropy loss derived from that accounts for class priors . Experiments on MNIST, CIFAR-10, and CIFAR-100 under pathological sharding and Dirichlet partitions show FedPSD consistently improves both client and server accuracy and reduces communication rounds to reach target performance, with ablations confirming the contribution of each component. The results indicate FedPSD is a practical, low-overhead solution for robust, personalized IIoT edge intelligence in real-world non-IID FL settings. Overall, the method advances personalized FL by effectively integrating global and local knowledge on the client side, and it demonstrates strong potential for deployment in resource-constrained IIoT environments.

Abstract

Personalized Federated Learning (PFL) focuses on tailoring models to individual IIoT clients in federated learning by addressing data heterogeneity and diverse user needs. Although existing studies have proposed effective PFL solutions from various perspectives, they overlook the issue of forgetting both historical personalized knowledge and global generalized knowledge during local training on clients. Therefore, this study proposes a novel PFL method, Federated Progressive Self-Distillation (FedPSD), based on logits calibration and progressive self-distillation. We analyze the impact mechanism of client data distribution characteristics on personalized and global knowledge forgetting. To address the issue of global knowledge forgetting, we propose a logits calibration approach for the local training loss and design a progressive self-distillation strategy to facilitate the gradual inheritance of global knowledge, where the model outputs from the previous epoch serve as virtual teachers to guide the training of subsequent epochs. Moreover, to address personalized knowledge forgetting, we construct calibrated fusion labels by integrating historical personalized model outputs, which are then used as teacher model outputs to guide the initial epoch of local self-distillation, enabling rapid recall of personalized knowledge. Extensive experiments under various data heterogeneity scenarios demonstrate the effectiveness and superiority of the proposed FedPSD method.

Paper Structure

This paper contains 29 sections, 10 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: The framework of federated progressive self-distillation with logits calibration.
  • Figure 2: Two distinct partitioning strategies were employed to segment the CIFAR-10 dataset, with visualizations of the distribution of different clients and their corresponding datasets. We randomly selected 10 clients for illustrative purposes.where $S$ denotes the dataset partitioned using the sharing pathological sharding strategy, and $\alpha$ represents the dataset partitioned using the LDA strategy.
  • Figure 3: Top-1 accuracy during training for different methods, where $S$ denotes the dataset partitioned using the sharing pathological sharding strategy, and $\alpha$ represents the dataset partitioned using the LDA strategy.The first row of figures illustrates the accuracy of the globally aggregated model during the training process, whereas the second row depicts the average accuracy of the clients throughout the training
  • Figure 4: Average client Top-1 accuracy at different epoch configurations.
  • Figure 5: Average client Top-1 accuracy at different clients configurations.