Cross-Silo Federated Learning Across Divergent Domains with Iterative Parameter Alignment

Matt Gorbett; Hossein Shirazi; Indrakshi Ray

Cross-Silo Federated Learning Across Divergent Domains with Iterative Parameter Alignment

Matt Gorbett, Hossein Shirazi, Indrakshi Ray

TL;DR

This work tackles cross-domain divergence and lack of personalization in cross-silo federated learning by proposing Iterative Parameter Alignment (IPA), which trains $N$ unique peer models in a decentralized topology through a parameter-alignment objective in addition to local losses. By minimizing $\mathcal{L}_i(\mathcal{D}_i;\theta_i)$ jointly with $\mathcal{A}_i(\theta^*)$, IPA enables convergence to a global objective even when peer domains are highly dissimilar, and it naturally yields per-peer models rather than a single global model. The approach demonstrates robustness to domain divergence, achieves competitive or state-of-the-art performance on balanced partitions, and provides a built-in fairness mechanism via early stopping; differential privacy and other protection methods are discussed as optional enhancements. These properties make IPA particularly suitable for privacy-preserving, cross-silo collaborations where data domains are heterogeneous or disjoint, and where model privacy and fairness are important. Overall, IPA offers a flexible, decentralized alternative that supports distinct per-peer models with reliable convergence across varied data distributions.

Abstract

Learning from the collective knowledge of data dispersed across private sources can provide neural networks with enhanced generalization capabilities. Federated learning, a method for collaboratively training a machine learning model across remote clients, achieves this by combining client models via the orchestration of a central server. However, current approaches face two critical limitations: i) they struggle to converge when client domains are sufficiently different, and ii) current aggregation techniques produce an identical global model for each client. In this work, we address these issues by reformulating the typical federated learning setup: rather than learning a single global model, we learn N models each optimized for a common objective. To achieve this, we apply a weighted distance minimization to model parameters shared in a peer-to-peer topology. The resulting framework, Iterative Parameter Alignment, applies naturally to the cross-silo setting, and has the following properties: (i) a unique solution for each participant, with the option to globally converge each model in the federation, and (ii) an optional early-stopping mechanism to elicit fairness among peers in collaborative learning settings. These characteristics jointly provide a flexible new framework for iteratively learning from peer models trained on disparate datasets. We find that the technique achieves competitive results on a variety of data partitions compared to state-of-the-art approaches. Further, we show that the method is robust to divergent domains (i.e. disjoint classes across peers) where existing approaches struggle.

Cross-Silo Federated Learning Across Divergent Domains with Iterative Parameter Alignment

TL;DR

This work tackles cross-domain divergence and lack of personalization in cross-silo federated learning by proposing Iterative Parameter Alignment (IPA), which trains

unique peer models in a decentralized topology through a parameter-alignment objective in addition to local losses. By minimizing

jointly with

, IPA enables convergence to a global objective even when peer domains are highly dissimilar, and it naturally yields per-peer models rather than a single global model. The approach demonstrates robustness to domain divergence, achieves competitive or state-of-the-art performance on balanced partitions, and provides a built-in fairness mechanism via early stopping; differential privacy and other protection methods are discussed as optional enhancements. These properties make IPA particularly suitable for privacy-preserving, cross-silo collaborations where data domains are heterogeneous or disjoint, and where model privacy and fairness are important. Overall, IPA offers a flexible, decentralized alternative that supports distinct per-peer models with reliable convergence across varied data distributions.

Abstract

Paper Structure (10 sections, 5 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 10 sections, 5 equations, 5 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Method
Experiments
Domain Divergent Silos
Comparison to Existing Approaches
Peer Model Comparison
Fairness through Early Stopping
Additional Analysis
Discussion

Figures (5)

Figure 1: Left: Test set accuracy across communication rounds of peers trained with Iterative Parameter Alignment compared to their standalone performance (trained only on their local data). There are twenty peers each trained with an imbalanced subset of the CIFAR-10 training set. They are split using heterogeneous data partitioning using a Dirichlet distribution with $\alpha=0.3$. One communication round (the x-axis) equals each $peer_i$ training their model ($f_i$) once. Center: Three peers each trained with distinct CIFAR-10 training labels (one peer has 4 labels, two peers have 3 labels each). We find that when peers have sufficiently divergent domains, existing methods fail, creating global models that do not reach baseline accuracy. Iterative Parameter Alignment produces distinct global models for each peer that each converge to baseline accuracy (85% on the test set). Right: A single iteration of Parameter Alignment trained in a ring topology (random topologies are used in experiments). The method relies on parameter exchange and alignment to learn from others. $\theta_1,\theta_2,...\theta_N$ are $N$ peers parameters and $f_1,f_2,... f_N$ are the models. $\theta^*$ represents all peer parameters $\{\theta_1,\theta_2,...\theta_N\}$. Each $peer_i$ can optionally apply differential privacy to their $\theta_i$ for protection. Our code is available at https://github.com/mattgorb/iterative_parameter_alignment.
Figure 2: Aligning Peer Models Trained on Disjoint Classes: We find that existing federated learning approaches such as struggle when trying to merge models trained with divergent (rather than heterogeneous) data partitions. We show how achieves stable training compared to existing approaches, with eventually converging to baseline accuracy compared to other methods which create global models with unstable performance.
Figure 3: Comparing Peer Models: We measure the distance between peer models across a variety of metrics. Each experiment contains ten peers and is aggregated across three runs, with the mean presented for each. Left: Measuring the distance between models across parameters (first two rows) and model predictions (the last three rows). The last three rows denote the Hamming distance between predictions, mutual correct predictions, and mutual incorrect predictions on the test set. Test set size for both datasets is 10k. Right: A similarity matrix of Hamming distances between peer model predictions for: 1) heterogeneous data partition (bottom triangle) and 2) homogeneous (IID) data partition (top triangle). The distances represent the number of mismatching predictions in the test set for each model. For reference, the lowest (averaged) Hamming distance between models in the IID setting is 880, with a test set size of 10k.
Figure 4: Fairness across iterations: We show that the algorithm creates fair models earlier in training before the global convergence of peer models. CIFAR-10 (Left) and MNIST (Right) performance across models and communication rounds, overlaid with peer models' correlation with their standalone performances (orange). We find that early in training, peer model performances are correlated with the performance of their standalone models relative to other peers. As training proceeds and the peer models globally converge, model fairness decreases, as can be seen in the MNIST figure.
Figure 5: Left: We show the instability of squared error alignment compared to absolute error in a CIFAR-10 experiment with two peers with 5 labels each. Right: We discover similar convergence rates of peers when models have the same initialization (green) and different initializations (orange).

Cross-Silo Federated Learning Across Divergent Domains with Iterative Parameter Alignment

TL;DR

Abstract

Cross-Silo Federated Learning Across Divergent Domains with Iterative Parameter Alignment

Authors

TL;DR

Abstract

Table of Contents

Figures (5)