MergeNet: Knowledge Migration across Heterogeneous Models, Tasks, and Modalities

Kunxi Li; Tianyu Zhan; Kairui Fu; Shengyu Zhang; Kun Kuang; Jiwei Li; Zhou Zhao; Fan Wu; Fei Wu

MergeNet: Knowledge Migration across Heterogeneous Models, Tasks, and Modalities

Kunxi Li, Tianyu Zhan, Kairui Fu, Shengyu Zhang, Kun Kuang, Jiwei Li, Zhou Zhao, Fan Wu, Fei Wu

TL;DR

MergeNet tackles heterogeneous knowledge transfer across diverse model architectures, tasks, and modalities by moving beyond direct parameter sharing to a parameter-space bridge built with low-rank re-encoding and a Low-rank Parametric Knowledge Adapter (LPKA). It introduces a stack of knowledge transfer layers (KTL) that progressively fuse knowledge from source to target via parameter re-encoding and attention over low-rank factors, trained jointly with the networks in a two-phase process that interleaves knowledge transfer with self-learning under a knowledge transfer cycle $T_{cycle}$. The approach is validated across cross-structural, cross-modal, cross-task, and self/frozen-model settings, consistently outperforming traditional knowledge distillation baselines and demonstrating robust transfer in challenging heterogeneous scenarios. This work enables practical knowledge migration for edge and resource-constrained deployments by allowing small systems to inherit complex capabilities from large, diverse models while keeping overhead low at inference time.

Abstract

In this study, we focus on heterogeneous knowledge transfer across entirely different model architectures, tasks, and modalities. Existing knowledge transfer methods (e.g., backbone sharing, knowledge distillation) often hinge on shared elements within model structures or task-specific features/labels, limiting transfers to complex model types or tasks. To overcome these challenges, we present MergeNet, which learns to bridge the gap of parameter spaces of heterogeneous models, facilitating the direct interaction, extraction, and application of knowledge within these parameter spaces. The core mechanism of MergeNet lies in the parameter adapter, which operates by querying the source model's low-rank parameters and adeptly learning to identify and map parameters into the target model. MergeNet is learned alongside both models, allowing our framework to dynamically transfer and adapt knowledge relevant to the current stage, including the training trajectory knowledge of the source model. Extensive experiments on heterogeneous knowledge transfer demonstrate significant improvements in challenging settings, where representative approaches may falter or prove less applicable.

MergeNet: Knowledge Migration across Heterogeneous Models, Tasks, and Modalities

TL;DR

. The approach is validated across cross-structural, cross-modal, cross-task, and self/frozen-model settings, consistently outperforming traditional knowledge distillation baselines and demonstrating robust transfer in challenging heterogeneous scenarios. This work enables practical knowledge migration for edge and resource-constrained deployments by allowing small systems to inherit complex capabilities from large, diverse models while keeping overhead low at inference time.

Abstract

Paper Structure (31 sections, 7 equations, 3 figures, 8 tables)

This paper contains 31 sections, 7 equations, 3 figures, 8 tables.

Introduction
Related Work
Method
Problem Formulation
MergeNet
Parameter Re-Encode.
Low-rank Parametric Knowledge Adapter.
Training Process.
Experiments
Cross-Structural Knowledge Transfer
Implementation Details.
Cross-Structural Knowledge Transfer Results.
Cross-Modal Knowledge Transfer
Implementation Details.
Cross-Modal Knowledge Transfer Results.
...and 16 more sections

Figures (3)

Figure 1: (a)-(c): Compare knowledge distillation, backbone sharing, and our proposed MergeNet. The orange arrows represent the flow of knowledge. (d): The parameter sharing method is ineffective for heterogeneous knowledge transfer, and in fact, may lead to a loss of accuracy due to the incompatibility of knowledge.
Figure 2: Overview of MergeNet. MergeNet takes parameters from different models as inputs and generates parameters that integrate knowledge from these models, where more knowledge transfer layers indicate a greater amount of knowledge transferred. It is important to note that the descriptions in (b) and (c) are based on the knowledge transfer from larger model to smaller model, but the process from smaller model to larger model is completely symmetrical.
Figure 3: Ablation with respect to $T_{cycle}$.

MergeNet: Knowledge Migration across Heterogeneous Models, Tasks, and Modalities

TL;DR

Abstract

MergeNet: Knowledge Migration across Heterogeneous Models, Tasks, and Modalities

Authors

TL;DR

Abstract

Table of Contents

Figures (3)