Table of Contents
Fetching ...

Client Clustering Meets Knowledge Sharing: Enhancing Privacy and Robustness in Personalized Peer-to-Peer Learning

Mohammad Mahdi Maheri, Denys Herasymuk, Hamed Haddadi

TL;DR

P4 tackles privacy-aware personalized learning in fully decentralized IoT settings by forming groups of similar clients using a privately computed model-weight similarity, then performing co-training within groups through differentially private knowledge distillation. The method combines DP, lightweight proxy models, and robust aggregation (m-Krum) with anomaly detection to defend against data and model poisoning while preserving accuracy under non-IID data. Empirical results across CIFAR-10/100 and FEMNIST show 5–30% higher accuracy than leading DP-P2P baselines and tolerance to up to 30% malicious clients, with only about 7 seconds of overhead for a two-client collaboration on edge devices. The work demonstrates practical applicability for IoT edge deployments, providing a scalable, privacy-preserving, and robust alternative to centralized or non-private P2P learning approaches.

Abstract

The growing adoption of Artificial Intelligence (AI) in Internet of Things (IoT) ecosystems has intensified the need for personalized learning methods that can operate efficiently and privately across heterogeneous, resource-constrained devices. However, enabling effective personalized learning in decentralized settings introduces several challenges, including efficient knowledge transfer between clients, protection of data privacy, and resilience against poisoning attacks. In this paper, we address these challenges by developing P4 (Personalized, Private, Peer-to-Peer) -- a method designed to deliver personalized models for resource-constrained IoT devices while ensuring differential privacy and robustness against poisoning attacks. Our solution employs a lightweight, fully decentralized algorithm to privately detect client similarity and form collaborative groups. Within each group, clients leverage differentially private knowledge distillation to co-train their models, maintaining high accuracy while ensuring robustness to the presence of malicious clients. We evaluate P4 on popular benchmark datasets using both linear and CNN-based architectures across various heterogeneity settings and attack scenarios. Experimental results show that P4 achieves 5% to 30% higher accuracy than leading differentially private peer-to-peer approaches and maintains robustness with up to 30% malicious clients. Additionally, we demonstrate its practicality by deploying it on resource-constrained devices, where collaborative training between two clients adds only ~7 seconds of overhead.

Client Clustering Meets Knowledge Sharing: Enhancing Privacy and Robustness in Personalized Peer-to-Peer Learning

TL;DR

P4 tackles privacy-aware personalized learning in fully decentralized IoT settings by forming groups of similar clients using a privately computed model-weight similarity, then performing co-training within groups through differentially private knowledge distillation. The method combines DP, lightweight proxy models, and robust aggregation (m-Krum) with anomaly detection to defend against data and model poisoning while preserving accuracy under non-IID data. Empirical results across CIFAR-10/100 and FEMNIST show 5–30% higher accuracy than leading DP-P2P baselines and tolerance to up to 30% malicious clients, with only about 7 seconds of overhead for a two-client collaboration on edge devices. The work demonstrates practical applicability for IoT edge deployments, providing a scalable, privacy-preserving, and robust alternative to centralized or non-private P2P learning approaches.

Abstract

The growing adoption of Artificial Intelligence (AI) in Internet of Things (IoT) ecosystems has intensified the need for personalized learning methods that can operate efficiently and privately across heterogeneous, resource-constrained devices. However, enabling effective personalized learning in decentralized settings introduces several challenges, including efficient knowledge transfer between clients, protection of data privacy, and resilience against poisoning attacks. In this paper, we address these challenges by developing P4 (Personalized, Private, Peer-to-Peer) -- a method designed to deliver personalized models for resource-constrained IoT devices while ensuring differential privacy and robustness against poisoning attacks. Our solution employs a lightweight, fully decentralized algorithm to privately detect client similarity and form collaborative groups. Within each group, clients leverage differentially private knowledge distillation to co-train their models, maintaining high accuracy while ensuring robustness to the presence of malicious clients. We evaluate P4 on popular benchmark datasets using both linear and CNN-based architectures across various heterogeneity settings and attack scenarios. Experimental results show that P4 achieves 5% to 30% higher accuracy than leading differentially private peer-to-peer approaches and maintains robustness with up to 30% malicious clients. Additionally, we demonstrate its practicality by deploying it on resource-constrained devices, where collaborative training between two clients adds only ~7 seconds of overhead.

Paper Structure

This paper contains 52 sections, 37 equations, 17 figures, 2 tables, 2 algorithms.

Figures (17)

  • Figure 1: The overall design of the P4 approach. Clients employ local ($\phi$) and proxy ($\theta$) models, aggregating only the proxy model. Groups are formed based on the $\ell_1$-norm dissimilarity of their model weights. Within each group, updates are exchanged via one client acting as an aggregator, which may change during training to balance communication overhead. After receiving the aggregated model, clients perform local training for personalization.
  • Figure 2: Test accuracy of a linear model on CIFAR-10 with $\epsilon=15$ and alpha-based setting: (a) $\gamma=25\%$ (b) $\gamma=50\%$ (c) $\gamma=75\%$. For each client, $\gamma\%$ of the data is sampled IID from all classes, while the remaining $1-\gamma\%$ comes from a single dominant class.
  • Figure 3: Test accuracy of CNN on CIFAR-10 with $\epsilon=15$ and alpha-based non-IID setting: (a) $\gamma=25\%$ (b) $\gamma=50\%$ (c) $\gamma=75\%$. For each client, $\gamma\%$ of the data is sampled IID from all classes, while the remaining $1-\gamma\%$ comes from a single dominant class.
  • Figure 4: Attack impact on P4 without secure aggregation before- and after-grouping for 3 non-IID datasets and a linear model with 30% malicious clients.
  • Figure 5: Performance of P4 with different defenses under poisoning attacks and 30% of malicious clients.
  • ...and 12 more figures