Table of Contents
Fetching ...

Homogenizing Non-IID datasets via In-Distribution Knowledge Distillation for Decentralized Learning

Deepak Ravikumar, Gobinda Saha, Sai Aparna Aketi, Kaushik Roy

TL;DR

This work tackles non-IID data across nodes in decentralized learning by introducing In-Distribution Knowledge Distillation (IDKD). IDKD distills knowledge from each node onto a common public dataset $D_P$, using a maximum softmax probability (MSP) based OoD detector to select in-distribution samples $D^i_{ID}$ and exchange averaged soft labels with neighbors, followed by fine-tuning on the union of private data $D_T^i$ and $D^i_{ID}$. The approach includes a five-step protocol—initial training, soft-label generation, OoD calibration, In-Distribution subset generation, and label exchange with fine-tuning—and shows improvements of 2–13% over vanilla KD and up to 8% over state-of-the-art decentralized methods with about 2% communication overhead, across diverse datasets and graph topologies. The results demonstrate that privacy-preserving data homogenization enables more effective decentralized training under heterogeneous data distributions, with scalable performance gains.

Abstract

Decentralized learning enables serverless training of deep neural networks (DNNs) in a distributed manner on multiple nodes. This allows for the use of large datasets, as well as the ability to train with a wide variety of data sources. However, one of the key challenges with decentralized learning is heterogeneity in the data distribution across the nodes. In this paper, we propose In-Distribution Knowledge Distillation (IDKD) to address the challenge of heterogeneous data distribution. The goal of IDKD is to homogenize the data distribution across the nodes. While such data homogenization can be achieved by exchanging data among the nodes sacrificing privacy, IDKD achieves the same objective using a common public dataset across nodes without breaking the privacy constraint. This public dataset is different from the training dataset and is used to distill the knowledge from each node and communicate it to its neighbors through the generated labels. With traditional knowledge distillation, the generalization of the distilled model is reduced because all the public dataset samples are used irrespective of their similarity to the local dataset. Thus, we introduce an Out-of-Distribution (OoD) detector at each node to label a subset of the public dataset that maps close to the local training data distribution. Finally, only labels corresponding to these subsets are exchanged among the nodes and with appropriate label averaging each node is finetuned on these data subsets along with its local data. Our experiments on multiple image classification datasets and graph topologies show that the proposed IDKD scheme is more effective than traditional knowledge distillation and achieves state-of-the-art generalization performance on heterogeneously distributed data with minimal communication overhead.

Homogenizing Non-IID datasets via In-Distribution Knowledge Distillation for Decentralized Learning

TL;DR

This work tackles non-IID data across nodes in decentralized learning by introducing In-Distribution Knowledge Distillation (IDKD). IDKD distills knowledge from each node onto a common public dataset , using a maximum softmax probability (MSP) based OoD detector to select in-distribution samples and exchange averaged soft labels with neighbors, followed by fine-tuning on the union of private data and . The approach includes a five-step protocol—initial training, soft-label generation, OoD calibration, In-Distribution subset generation, and label exchange with fine-tuning—and shows improvements of 2–13% over vanilla KD and up to 8% over state-of-the-art decentralized methods with about 2% communication overhead, across diverse datasets and graph topologies. The results demonstrate that privacy-preserving data homogenization enables more effective decentralized training under heterogeneous data distributions, with scalable performance gains.

Abstract

Decentralized learning enables serverless training of deep neural networks (DNNs) in a distributed manner on multiple nodes. This allows for the use of large datasets, as well as the ability to train with a wide variety of data sources. However, one of the key challenges with decentralized learning is heterogeneity in the data distribution across the nodes. In this paper, we propose In-Distribution Knowledge Distillation (IDKD) to address the challenge of heterogeneous data distribution. The goal of IDKD is to homogenize the data distribution across the nodes. While such data homogenization can be achieved by exchanging data among the nodes sacrificing privacy, IDKD achieves the same objective using a common public dataset across nodes without breaking the privacy constraint. This public dataset is different from the training dataset and is used to distill the knowledge from each node and communicate it to its neighbors through the generated labels. With traditional knowledge distillation, the generalization of the distilled model is reduced because all the public dataset samples are used irrespective of their similarity to the local dataset. Thus, we introduce an Out-of-Distribution (OoD) detector at each node to label a subset of the public dataset that maps close to the local training data distribution. Finally, only labels corresponding to these subsets are exchanged among the nodes and with appropriate label averaging each node is finetuned on these data subsets along with its local data. Our experiments on multiple image classification datasets and graph topologies show that the proposed IDKD scheme is more effective than traditional knowledge distillation and achieves state-of-the-art generalization performance on heterogeneously distributed data with minimal communication overhead.
Paper Structure (7 sections, 3 figures, 9 tables, 1 algorithm)

This paper contains 7 sections, 3 figures, 9 tables, 1 algorithm.

Figures (3)

  • Figure 1: A conceptual overview of IDKD homogenization. We visualize an exaggerated latent space of a 2-node decentralized setup with non-IID data having 4 classes. (a) Two similar networks with small differences in decision boundaries due to data skew. (b) Public data visualized in the latent space (c) Exchange IDKD identified subset using labels only (d) More homogeneous data distribution resulting in better decision boundaries post IDKD training.
  • Figure 2: (a) Overview of the proposed IDKD framework for decentralized training. (i) Each node $i$ trains on the private dataset $D^i_T$ till convergence on a decentralized training algorithm. (ii) Next, soft labels for the public dataset are generated (iii) OoD detector is calibrated (iv) OoD detector is used to extract a subset dataset $D^i_{ID}$ that is similar to the private dataset. (v) Soft labels corresponding to $D^i_{ID}$ are exchanged between the neighbors and the models are fine-tuned on the private and public dataset subset. (b) Visualizing decision boundaries of 2 nodes. For improved KD we include ID-like data from the public set while excluding low-confidence conflicting examples. (c) The OoD detector used by the IDKD framework. A calibration dataset and the private dataset are used as OoD data and ID data respectively. This is used to identify the optimal threshold.
  • Figure 3: Visualizing the results of the proposed IDKD method (a) on the data distribution pre and post IDKD (b) comparing the convergence of IDKD vs DSGDm-N.