Table of Contents
Fetching ...

Guided Model Merging for Hybrid Data Learning: Leveraging Centralized Data to Refine Decentralized Models

Junyi Zhu, Ruicong Yao, Taha Ceritli, Savas Ozkan, Matthew B. Blaschko, Eunchung Noh, Jeongwon Min, Cho Jung Min, Mete Ozay

TL;DR

This work formalizes a hybrid data regime where centralized and decentralized data coexist and introduces Federated Dual Learning (Feddle), a framework that buffers asynchronous client updates in a model atlas and guides a server-side coefficient search using centralized data. By allowing negative merging coefficients and employing a surrogate loss for out-of-domain data, Feddle achieves faster convergence than traditional FL and prior hybrid approaches, while remaining robust to domain shifts and noise. Theoretical results establish faster convergence under in-domain data and convergent behavior with bounded error for out-of-domain data; comprehensive experiments across multiple datasets demonstrate consistent performance gains and practical viability. The methodology offers a principled, scalable way to harmonize decentralized learning with centralized data resources in real-world, heterogeneous networks.

Abstract

Current network training paradigms primarily focus on either centralized or decentralized data regimes. However, in practice, data availability often exhibits a hybrid nature, where both regimes coexist. This hybrid setting presents new opportunities for model training, as the two regimes offer complementary trade-offs: decentralized data is abundant but subject to heterogeneity and communication constraints, while centralized data, though limited in volume and potentially unrepresentative, enables better curation and high-throughput access. Despite its potential, effectively combining these paradigms remains challenging, and few frameworks are tailored to hybrid data regimes. To address this, we propose a novel framework that constructs a model atlas from decentralized models and leverages centralized data to refine a global model within this structured space. The refined model is then used to reinitialize the decentralized models. Our method synergizes federated learning (to exploit decentralized data) and model merging (to utilize centralized data), enabling effective training under hybrid data availability. Theoretically, we show that our approach achieves faster convergence than methods relying solely on decentralized data, due to variance reduction in the merging process. Extensive experiments demonstrate that our framework consistently outperforms purely centralized, purely decentralized, and existing hybrid-adaptable methods. Notably, our method remains robust even when the centralized and decentralized data domains differ or when decentralized data contains noise, significantly broadening its applicability.

Guided Model Merging for Hybrid Data Learning: Leveraging Centralized Data to Refine Decentralized Models

TL;DR

This work formalizes a hybrid data regime where centralized and decentralized data coexist and introduces Federated Dual Learning (Feddle), a framework that buffers asynchronous client updates in a model atlas and guides a server-side coefficient search using centralized data. By allowing negative merging coefficients and employing a surrogate loss for out-of-domain data, Feddle achieves faster convergence than traditional FL and prior hybrid approaches, while remaining robust to domain shifts and noise. Theoretical results establish faster convergence under in-domain data and convergent behavior with bounded error for out-of-domain data; comprehensive experiments across multiple datasets demonstrate consistent performance gains and practical viability. The methodology offers a principled, scalable way to harmonize decentralized learning with centralized data resources in real-world, heterogeneous networks.

Abstract

Current network training paradigms primarily focus on either centralized or decentralized data regimes. However, in practice, data availability often exhibits a hybrid nature, where both regimes coexist. This hybrid setting presents new opportunities for model training, as the two regimes offer complementary trade-offs: decentralized data is abundant but subject to heterogeneity and communication constraints, while centralized data, though limited in volume and potentially unrepresentative, enables better curation and high-throughput access. Despite its potential, effectively combining these paradigms remains challenging, and few frameworks are tailored to hybrid data regimes. To address this, we propose a novel framework that constructs a model atlas from decentralized models and leverages centralized data to refine a global model within this structured space. The refined model is then used to reinitialize the decentralized models. Our method synergizes federated learning (to exploit decentralized data) and model merging (to utilize centralized data), enabling effective training under hybrid data availability. Theoretically, we show that our approach achieves faster convergence than methods relying solely on decentralized data, due to variance reduction in the merging process. Extensive experiments demonstrate that our framework consistently outperforms purely centralized, purely decentralized, and existing hybrid-adaptable methods. Notably, our method remains robust even when the centralized and decentralized data domains differ or when decentralized data contains noise, significantly broadening its applicability.

Paper Structure

This paper contains 36 sections, 3 theorems, 34 equations, 13 figures, 8 tables, 1 algorithm.

Key Result

Theorem 1

Suppose the above assumptions hold, and $\mathcal{D}_S$ represents the in-domain data. In addition, suppose the client's delay is bounded by $\tau_{max}$, and Feddle's merging coefficients satisfies ${abs(\hat{c}_m)<\hat{c}_{max}}$. Then, Feddle at least has the same convergence rate as FedBuff and where $r_{FL}$ is the rate of FedBuff or FedAvg, ${C_T = A_0\left(1-\frac{1}{4^T}\right)}$, and $A_

Figures (13)

  • Figure 1: Illustration of data regimes. (a) Centralized regime: all data is aggregated at the server for training. (b) Decentralized regime: data remains distributed across clients, which train local models and share updates with the server. (c) Hybrid regime: decentralized learning is performed while a centralized dataset is concurrently available to assist the training process. We follow standard FL terminology, referring to the central node as the server and the distributed nodes as clients.
  • Figure 2: Statistics of model updates in FL under varying degrees of data heterogeneity simulated using Dirichlet distribution (denoted as Dir$(\cdot)$) following previous work yurochkin2019bayesian. Subplot (a) displays mean values, with bands representing max. and min. values.
  • Figure 3: Illustration of synchronous (a) and asynchronous (b) communication in FL. Downlink is simplified for clarity.
  • Figure 4: Overview of the Feddle framework. The server coordinates clients' local training using an asynchronous mechanism. Model atlas is updated by clients' model updates, which is then used to conduct coefficient search for the global model optimization.
  • Figure 5: Convergence plots of ResNet18 on CIFAR100 with Dir(0.1), $\mathcal{N}(20)$. More plots are provided in \ref{['app:res:convergence']}.
  • ...and 8 more figures

Theorems & Definitions (8)

  • Theorem 1: In-domain data
  • Remark 1
  • Theorem 2: Out-of-domain data
  • Remark 2
  • Lemma 1
  • proof
  • proof : Proof of Theorem \ref{['thm:in-domain']}
  • proof : Proof of Theorem \ref{['thm:out-of-domain']}