The Diversity Bonus: Learning from Dissimilar Distributed Clients in Personalized Federated Learning

Xinghao Wu; Xuefeng Liu; Jianwei Niu; Guogang Zhu; Shaojie Tang; Xiaotian Li; Jiannong Cao

The Diversity Bonus: Learning from Dissimilar Distributed Clients in Personalized Federated Learning

Xinghao Wu, Xuefeng Liu, Jianwei Niu, Guogang Zhu, Shaojie Tang, Xiaotian Li, Jiannong Cao

TL;DR

The paper tackles non-IID challenges in personalized federated learning by enabling clients to benefit from dissimilar distributions. It introduces DiversiFed, which uses a distance-based attraction–repulsion loss in parameter space and an incremental proximal optimization to coordinate server and client updates, producing personalized models that both cluster with similar clients and separate from dissimilar ones. The authors provide convergence guarantees under strong convexity and validate their method across natural and medical datasets, showing improved performance over state-of-the-art methods, especially in highly non-IID settings. The approach offers enhanced generalization, robustness to partial participation, and scalable communication, highlighting its practical impact for real-world distributed learning with privacy constraints.

Abstract

Personalized Federated Learning (PFL) is a commonly used framework that allows clients to collaboratively train their personalized models. PFL is particularly useful for handling situations where data from different clients are not independent and identically distributed (non-IID). Previous research in PFL implicitly assumes that clients can gain more benefits from those with similar data distributions. Correspondingly, methods such as personalized weight aggregation are developed to assign higher weights to similar clients during training. We pose a question: can a client benefit from other clients with dissimilar data distributions and if so, how? This question is particularly relevant in scenarios with a high degree of non-IID, where clients have widely different data distributions, and learning from only similar clients will lose knowledge from many other clients. We note that when dealing with clients with similar data distributions, methods such as personalized weight aggregation tend to enforce their models to be close in the parameter space. It is reasonable to conjecture that a client can benefit from dissimilar clients if we allow their models to depart from each other. Based on this idea, we propose DiversiFed which allows each client to learn from clients with diversified data distribution in personalized federated learning. DiversiFed pushes personalized models of clients with dissimilar data distributions apart in the parameter space while pulling together those with similar distributions. In addition, to achieve the above effect without using prior knowledge of data distribution, we design a loss function that leverages the model similarity to determine the degree of attraction and repulsion between any two models. Experiments on several datasets show that DiversiFed can benefit from dissimilar clients and thus outperform the state-of-the-art methods.

The Diversity Bonus: Learning from Dissimilar Distributed Clients in Personalized Federated Learning

TL;DR

Abstract

Paper Structure (23 sections, 2 theorems, 15 equations, 9 figures, 8 tables, 1 algorithm)

This paper contains 23 sections, 2 theorems, 15 equations, 9 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Traditional Federated Learning
Personalized Federated Learning
Learning from Both Similar and Dissimilar Clients in Personalized Federated Learning
Overview of DiversiFed
PFL Problem Definition
Designing the Loss Function in DiversiFed
An Incremental Proximal Method to Minimize the Loss Function
DiversiFed: Convergence Analysis
Experiments
Dataset Settings
Comparison Methods and Implementation Details
Comparing with SOTA on the Pathological non-IID Setting
Comparing with SOTA on the Dirichlet non-IID Setting
...and 8 more sections

Key Result

Lemma 1

If we optimize objective loss function by Eq. server optimize and Eq. client optimize, let $d_j=\frac{||w_i - w_j||}{\tau}$, we have

Figures (9)

Figure 1: An experiment to verify the effect of pulling personalized models together and pushing personalized models apart when the data distribution is very different.
Figure 2: A toy example to show the training process of DiversiFed. Similar clients (i.e., B and C) are pulled together and the dissimilar client (i.e., A) is pushed apart.
Figure 3: The system workflow of DiversiFed.
Figure 4: A toy example to show the effect of $L_{d}$ in Eq. \ref{['example of cl loss']}. (a) shows the effect of item 1. (b) shows the effect of item 2. (c) shows the combined effect of item 1 and item 2.
Figure 5: This figure illustrates the data allocation for each client under different $\alpha$ according to Dirichlet distribution. The horizontal axis represents the client ID and the vertical axis represents the data class label index. Red dots represent the data assigned to clients. The larger the dot is, the more data the client has in this class.
...and 4 more figures

Theorems & Definitions (4)

Lemma 1
proof
Remark 1
Theorem 1

The Diversity Bonus: Learning from Dissimilar Distributed Clients in Personalized Federated Learning

TL;DR

Abstract

The Diversity Bonus: Learning from Dissimilar Distributed Clients in Personalized Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (4)