Table of Contents
Fetching ...

FedDAG: Clustered Federated Learning via Global Data and Gradient Integration for Heterogeneous Environments

Anik Pramanik, Murat Kantarcioglu, Vincent Oria, Shantanu Sharma

TL;DR

FedDAG introduces a clustered FL framework, FedDAG, that employs a weighted, class-wise similarity metric that integrates both data and gradient information, providing a more holistic measure of similarity during clustering.

Abstract

Federated Learning (FL) enables a group of clients to collaboratively train a model without sharing individual data, but its performance drops when client data are heterogeneous. Clustered FL tackles this by grouping similar clients. However, existing clustered FL approaches rely solely on either data similarity or gradient similarity; however, this results in an incomplete assessment of client similarities. Prior clustered FL approaches also restrict knowledge and representation sharing to clients within the same cluster. This prevents cluster models from benefiting from the diverse client population across clusters. To address these limitations, FedDAG introduces a clustered FL framework, FedDAG, that employs a weighted, class-wise similarity metric that integrates both data and gradient information, providing a more holistic measure of similarity during clustering. In addition, FedDAG adopts a dual-encoder architecture for cluster models, comprising a primary encoder trained on its own clients' data and a secondary encoder refined using gradients from complementary clusters. This enables cross-cluster feature transfer while preserving cluster-specific specialization. Experiments on diverse benchmarks and data heterogeneity settings show that FedDAG consistently outperforms state-of-the-art clustered FL baselines in accuracy.

FedDAG: Clustered Federated Learning via Global Data and Gradient Integration for Heterogeneous Environments

TL;DR

FedDAG introduces a clustered FL framework, FedDAG, that employs a weighted, class-wise similarity metric that integrates both data and gradient information, providing a more holistic measure of similarity during clustering.

Abstract

Federated Learning (FL) enables a group of clients to collaboratively train a model without sharing individual data, but its performance drops when client data are heterogeneous. Clustered FL tackles this by grouping similar clients. However, existing clustered FL approaches rely solely on either data similarity or gradient similarity; however, this results in an incomplete assessment of client similarities. Prior clustered FL approaches also restrict knowledge and representation sharing to clients within the same cluster. This prevents cluster models from benefiting from the diverse client population across clusters. To address these limitations, FedDAG introduces a clustered FL framework, FedDAG, that employs a weighted, class-wise similarity metric that integrates both data and gradient information, providing a more holistic measure of similarity during clustering. In addition, FedDAG adopts a dual-encoder architecture for cluster models, comprising a primary encoder trained on its own clients' data and a secondary encoder refined using gradients from complementary clusters. This enables cross-cluster feature transfer while preserving cluster-specific specialization. Experiments on diverse benchmarks and data heterogeneity settings show that FedDAG consistently outperforms state-of-the-art clustered FL baselines in accuracy.
Paper Structure (46 sections, 1 theorem, 58 equations, 5 figures, 19 tables, 4 algorithms)

This paper contains 46 sections, 1 theorem, 58 equations, 5 figures, 19 tables, 4 algorithms.

Key Result

Theorem A.1

Let the assumptions above hold. Choose learning rates $\eta=\tau/(L E)$ and $\eta_{\mathrm{share}}=\Theta(1/L)$, for a constant $\tau$ depending on $L$, the variance terms, heterogeneity, and participation. Then, ignoring absolute constants and provided clustering is stable, where the effective variance terms are

Figures (5)

  • Figure 1: Overview of FedDAG. Clients compute principal vectors and gradients to build an adjacency matrix and a graph indicating which clusters can supply features for cross-cluster sharing. Training proceeds in two phases: (1) the primary encoder and classifier are trained on each cluster's local data; (2) the secondary encoder of a requesting cluster is trained on source cluster's data
  • Figure 2: Exp 2: Clustering score vs cluster $\alpha$ and number of clusters for finding optimal clustering.
  • Figure 3: Exp 6: Accuracy vs. number of comm. rounds, Data Distribution I, non-IID (30%), $\alpha'$=1.
  • Figure 4: Effect of the federated-aware clustering loss on the adaptive clustering mechanism under Data Distribution I ($30\%$ label skew, Dirichlet $\alpha' = 0.25$) for CIFAR-10 and SVHN.
  • Figure 5: Behavior of FedDAG's adaptive clustering mechanism in a setting with inherent high number of clusters (e.g,. $> 10$) ground-truth distributions.

Theorems & Definitions (1)

  • Theorem A.1: Convergence of FedDAG (per-cluster globals, dual encoders)