MergePipe: A Budget-Aware Parameter Management System for Scalable LLM Merging

Yuanyi Wang; Yanggan Gu; Zihao Wang; Kunxi Li; Yifan Yang; Zhaoyi Yan; Congkai Xie; Jianmin Wu; Hongxia Yang

MergePipe: A Budget-Aware Parameter Management System for Scalable LLM Merging

Yuanyi Wang, Yanggan Gu, Zihao Wang, Kunxi Li, Yifan Yang, Zhaoyi Yan, Congkai Xie, Jianmin Wu, Hongxia Yang

TL;DR

MergePipe is the first system that treats LLM merging as a data management and execution problem, and introduces a catalog-driven abstraction over model parameters, merge plans, and execution lineage, which mitigates the growth of naive pipelines and achieves predictable scaling behavior.

Abstract

Large language model (LLM) merging has become a key technique in modern LLM development pipelines, enabling the integration of multiple task- or domain-specific expert models without retraining. However, as the number of experts grows, existing merging implementations treat model parameters as unstructured files and execute merges in a stateless, one-shot manner, leading to excessive disk I/O, redundant parameter scans, and poor scalability. In this paper, we present \textbf{MergePipe}, a parameter management system for scalable LLM merging. MergePipe is the first system that treats LLM merging as a data management and execution problem, and introduces a catalog-driven abstraction over model parameters, merge plans, and execution lineage. At its core, MergePipe employs a cost-aware planner that explicitly models expert parameter I/O and enforces user-specified I/O budgets, followed by a streaming execution engine that materializes merged models under transactional guarantees. Our key insight is that while base model reads and output writes are unavoidable, expert parameter reads dominate merge cost and constitute the primary optimization target. By making expert access budget-aware throughout planning and execution, MergePipe mitigates the $O(K)$ I/O growth of naive pipelines and achieves predictable scaling behavior. Experiments show that MergePipe reduces total I/O by up to an order of magnitude and delivers up to $11\times$ end-to-end speedups (up to 90\% wall-time reduction) over state-of-the-art LLM merging pipelines.

MergePipe: A Budget-Aware Parameter Management System for Scalable LLM Merging

TL;DR

Abstract

I/O growth of naive pipelines and achieves predictable scaling behavior. Experiments show that MergePipe reduces total I/O by up to an order of magnitude and delivers up to

end-to-end speedups (up to 90\% wall-time reduction) over state-of-the-art LLM merging pipelines.

Paper Structure (34 sections, 16 equations, 7 figures, 7 tables, 2 algorithms)

This paper contains 34 sections, 16 equations, 7 figures, 7 tables, 2 algorithms.

Introduction
System Overview
Design Goals and Scope
Data Model and Catalog
Merge Planning and Execution Workflow
Cost Model for Model Merging
Overview and Assumptions
Cost Decomposition
Block-Level Cost Estimation
Budget-Constrained Planning Objective
Merge Planning
Planner Interface and Scope
Cost Binding to the Model
Conflict-Aware Signals
Budget-Aware Plan Generation
...and 19 more sections

Figures (7)

Figure 1: Naive LLM merging vs. MergePipe. Naive pipelines merge in a stateless, one-shot manner and repeatedly scans expert checkpoints ($O(K)$ expert I/O), while MergePipe introduces planning, reuse, and budget-bounded expert reads.
Figure 2: Naive merging scales poorly with experts. For 3B models, both TIES and DARE show near-linear growth in total I/O (bars) as $K$ increases, and wall time rises accordingly.
Figure 3: MergePipe system overview. MergePipe decouples parameter storage, budget-aware planning, and streaming execution for scalable LLM merging. A persistent catalog supports block-level indexing/reuse; a cost- and conflict-aware planner selects expert blocks under an explicit I/O budget; and an execution engine enforces the plan via DeltaIterator with atomic publish and lineage/explainability records.
Figure 4: Scaling with the number of experts.(a) Expert read I/O (top) and end-to-end wall time (bottom). Naive merging repeatedly scans expert checkpoints, leading to near-linear growth in expert I/O and runtime as the number of experts increases. MergePipe enforces an explicit expert I/O budget at execution time, keeping expert reads bounded and significantly reducing wall time. (b) Total I/O shows that naive pipelines become increasingly expert-read dominated, while MergePipe reduces total I/O by limiting expert access. (c) I/O composition at the maximum number of experts illustrates that MergePipe shifts the dominant cost from expert reads to unavoidable base reads and output writes. All experiments are conducted on CPU using the same merge operator (TIES).
Figure 5: (a) Planning+flush+commit is a small fraction of wall time. (b) Budgeting mainly reduces expert reads. (c) Pre-budget expert read I/O scales with $K$ across models.
...and 2 more figures

Theorems & Definitions (2)

definition 1: Merge Plan
definition 2: Feasibility

MergePipe: A Budget-Aware Parameter Management System for Scalable LLM Merging

TL;DR

Abstract

MergePipe: A Budget-Aware Parameter Management System for Scalable LLM Merging

Authors

TL;DR

Abstract

Table of Contents

Figures (7)

Theorems & Definitions (2)