Table of Contents
Fetching ...

Synergy over Discrepancy: A Partition-Based Approach to Multi-Domain LLM Fine-Tuning

Hua Ye, Siyuan Chen, Haoliang Zhang, Weihao Luo, Yanbin Li, Xuan Zhang

TL;DR

The paper tackles multi-domain fine-tuning of large language models by introducing a partition-based, multi-stage framework that clusters domains to maximize inter-domain synergy while controlling discrepancy and parameter budgets. The approach is supported by theoretical generalization bounds that incorporate domain discrepancy, synergy, and adapter/backbone capacity, and by an algorithm that efficiently partitions domains and performs stage-wise PEFT fine-tuning. Empirically, the method consistently exceeds baselines across four language-understanding tasks and multiple backbones, while reducing memory footprint and improving convergence. This synergy-aware partitioning offers a scalable, robust path for deploying LLMs across diverse domains, with strong implications for practical multi-domain adaptation and continual learning.

Abstract

Large language models (LLMs) demonstrate impressive generalization abilities, yet adapting them effectively across multiple heterogeneous domains remains challenging due to inter-domain interference. To overcome this challenge, we propose a partition-based multi-stage fine-tuning framework designed to exploit inter-domain synergies while minimizing negative transfer. Our approach strategically partitions domains into subsets (stages) by balancing domain discrepancy, synergy, and model capacity constraints. We theoretically analyze the proposed framework and derive novel generalization bounds that justify our partitioning strategy. Extensive empirical evaluations on various language understanding tasks show that our method consistently outperforms state-of-the-art baselines.

Synergy over Discrepancy: A Partition-Based Approach to Multi-Domain LLM Fine-Tuning

TL;DR

The paper tackles multi-domain fine-tuning of large language models by introducing a partition-based, multi-stage framework that clusters domains to maximize inter-domain synergy while controlling discrepancy and parameter budgets. The approach is supported by theoretical generalization bounds that incorporate domain discrepancy, synergy, and adapter/backbone capacity, and by an algorithm that efficiently partitions domains and performs stage-wise PEFT fine-tuning. Empirically, the method consistently exceeds baselines across four language-understanding tasks and multiple backbones, while reducing memory footprint and improving convergence. This synergy-aware partitioning offers a scalable, robust path for deploying LLMs across diverse domains, with strong implications for practical multi-domain adaptation and continual learning.

Abstract

Large language models (LLMs) demonstrate impressive generalization abilities, yet adapting them effectively across multiple heterogeneous domains remains challenging due to inter-domain interference. To overcome this challenge, we propose a partition-based multi-stage fine-tuning framework designed to exploit inter-domain synergies while minimizing negative transfer. Our approach strategically partitions domains into subsets (stages) by balancing domain discrepancy, synergy, and model capacity constraints. We theoretically analyze the proposed framework and derive novel generalization bounds that justify our partitioning strategy. Extensive empirical evaluations on various language understanding tasks show that our method consistently outperforms state-of-the-art baselines.

Paper Structure

This paper contains 54 sections, 5 theorems, 26 equations, 3 figures, 21 tables, 1 algorithm.

Key Result

Lemma 3.1

Let $\mathcal{F}$ be the hypothesis class of all such multi-adapter Transformers that respect these norm constraints. Then for $n$ i.i.d. samples per domain from $k$ source domains $\{\mathcal{D}_1,\dots,\mathcal{D}_k\}$, there exists a constant $C_{\text{T}} > 0$ (depending on $L,\,\Omega_{\text{co indicating that limiting both the core Transformer parameters and the adapter parameters yields a c

Figures (3)

  • Figure 1: Training loss curves on the Q&A domain with LLaMA2-13B.
  • Figure 2: Synergy metric sensitivity.
  • Figure 3: Domain discrepancy heatmaps across three dimensions: Token Distribution, Vocabulary Overlap, and Semantic Similarity.

Theorems & Definitions (15)

  • Definition 1: Multi-Source Fine-Tuned Model
  • Definition 2: Domain Discrepancy
  • Lemma 3.1: Rademacher Complexity for Multi-Adapter Transformers
  • proof
  • Theorem 3.1: Multi-Source Concurrent Generalization
  • proof : Proof Sketch
  • Remark 3.1: Domain Similarity vs. Model Capacity
  • Theorem 3.2: Multi-Stage Partition with Synergy-Capacity Maximisation
  • proof : Proof Sketch
  • Corollary 3.1: High-Synergy Subset Tends to be Grouped Together
  • ...and 5 more