Guided Model Merging for Hybrid Data Learning: Leveraging Centralized Data to Refine Decentralized Models

Junyi Zhu; Ruicong Yao; Taha Ceritli; Savas Ozkan; Matthew B. Blaschko; Eunchung Noh; Jeongwon Min; Cho Jung Min; Mete Ozay

Guided Model Merging for Hybrid Data Learning: Leveraging Centralized Data to Refine Decentralized Models

Junyi Zhu, Ruicong Yao, Taha Ceritli, Savas Ozkan, Matthew B. Blaschko, Eunchung Noh, Jeongwon Min, Cho Jung Min, Mete Ozay

TL;DR

This work formalizes a hybrid data regime where centralized and decentralized data coexist and introduces Federated Dual Learning (Feddle), a framework that buffers asynchronous client updates in a model atlas and guides a server-side coefficient search using centralized data. By allowing negative merging coefficients and employing a surrogate loss for out-of-domain data, Feddle achieves faster convergence than traditional FL and prior hybrid approaches, while remaining robust to domain shifts and noise. Theoretical results establish faster convergence under in-domain data and convergent behavior with bounded error for out-of-domain data; comprehensive experiments across multiple datasets demonstrate consistent performance gains and practical viability. The methodology offers a principled, scalable way to harmonize decentralized learning with centralized data resources in real-world, heterogeneous networks.

Abstract

Current network training paradigms primarily focus on either centralized or decentralized data regimes. However, in practice, data availability often exhibits a hybrid nature, where both regimes coexist. This hybrid setting presents new opportunities for model training, as the two regimes offer complementary trade-offs: decentralized data is abundant but subject to heterogeneity and communication constraints, while centralized data, though limited in volume and potentially unrepresentative, enables better curation and high-throughput access. Despite its potential, effectively combining these paradigms remains challenging, and few frameworks are tailored to hybrid data regimes. To address this, we propose a novel framework that constructs a model atlas from decentralized models and leverages centralized data to refine a global model within this structured space. The refined model is then used to reinitialize the decentralized models. Our method synergizes federated learning (to exploit decentralized data) and model merging (to utilize centralized data), enabling effective training under hybrid data availability. Theoretically, we show that our approach achieves faster convergence than methods relying solely on decentralized data, due to variance reduction in the merging process. Extensive experiments demonstrate that our framework consistently outperforms purely centralized, purely decentralized, and existing hybrid-adaptable methods. Notably, our method remains robust even when the centralized and decentralized data domains differ or when decentralized data contains noise, significantly broadening its applicability.

Guided Model Merging for Hybrid Data Learning: Leveraging Centralized Data to Refine Decentralized Models

TL;DR

Abstract

Guided Model Merging for Hybrid Data Learning: Leveraging Centralized Data to Refine Decentralized Models

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (8)