Advocating for the Silent: Enhancing Federated Generalization for Non-Participating Clients
Zheshun Wu, Zenglin Xu, Dun Zeng, Qifan Wang, Jie Liu
TL;DR
This work addresses federated learning generalization to non-participating clients in heterogeneous data settings by introducing an information-theoretic framework based on self-information weighted risk. It derives an entropy-aware generalization bound and then proposes two practical approaches to improve generalization: (i) maximum entropy-based weighting to emphasize high-entropy data sources, and (ii) gradient similarity–based client selection (minimax and convex-hull variants) to diversify training distributions. Empirical results across EMNIST-10, CIFAR-10/100, Shakespeare, and NICO show improved out-of-distribution performance when applying these methods, aligning with the theoretical guarantees. The work advances FL by accounting for data-source redundancy and targeting non-participating clients, offering scalable strategies to broaden generalization beyond observed participating distributions.
Abstract
Federated Learning (FL) has surged in prominence due to its capability of collaborative model training without direct data sharing. However, the vast disparity in local data distributions among clients, often termed the Non-Independent Identically Distributed (Non-IID) challenge, poses a significant hurdle to FL's generalization efficacy. The scenario becomes even more complex when not all clients participate in the training process, a common occurrence due to unstable network connections or limited computational capacities. This can greatly complicate the assessment of the trained models' generalization abilities. While a plethora of recent studies has centered on the generalization gap pertaining to unseen data from participating clients with diverse distributions, the distinction between the training distributions of participating clients and the testing distributions of non-participating ones has been largely overlooked. In response, our paper unveils an information-theoretic generalization framework for FL. Specifically, it quantifies generalization errors by evaluating the information entropy of local distributions and discerning discrepancies across these distributions. Inspired by our deduced generalization bounds, we introduce a weighted aggregation approach and a duo of client selection strategies. These innovations are designed to strengthen FL's ability to generalize and thus ensure that trained models perform better on non-participating clients by incorporating a more diverse range of client data distributions. Our extensive empirical evaluations reaffirm the potency of our proposed methods, aligning seamlessly with our theoretical construct.
