Table of Contents
Fetching ...

Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild

Xinyu Zhao, Guoheng Sun, Ruisi Cai, Yukun Zhou, Pingzhi Li, Peihao Wang, Bowen Tan, Yexiao He, Li Chen, Yi Liang, Beidi Chen, Binhang Yuan, Hongyi Wang, Ang Li, Zhangyang Wang, Tianlong Chen

TL;DR

Model-GLUE tackles the challenge of democratized LLM scaling by benchmarking and orchestrating merging and mixture across a large, heterogeneous model zoo. It introduces a clustering-based selective merging pipeline and various model mixture configurations (FFN-, block-, and model-level) with router design choices and input strategies, including a hybrid approach. Empirical results on a Llama-2-based Which16 family show an average improvement of 5.61% over the best single model without extra training, and the framework demonstrates robustness across diverse benchmarks and model families (including Mistral). The work provides a practical recipe for assembling diverse LLM capabilities while highlighting limitations, energy considerations, and future directions such as model stacking and communication for even broader scalability.

Abstract

As Large Language Models (LLMs) excel across tasks and specialized domains, scaling LLMs based on existing models has garnered significant attention, which faces the challenge of decreasing performance when combining disparate models. Various techniques have been proposed for the aggregation of pre-trained LLMs, including model merging, Mixture-of-Experts, and stacking. Despite their merits, a comprehensive comparison and synergistic application of them to a diverse model zoo is yet to be adequately addressed. In light of this research gap, this paper introduces Model-GLUE, a holistic LLM scaling guideline. First, our work starts with a benchmarking of existing LLM scaling techniques, especially selective merging, and variants of mixture. Utilizing the insights from the benchmark results, we formulate an optimal strategy for the selection and aggregation of a heterogeneous model zoo characterizing different architectures and initialization.Our methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters through a model mixture. Finally, evidenced by our experiments on a diverse Llama-2-based model zoo, Model-GLUE shows an average performance enhancement of 5.61%, achieved without additional training. Codes are available at: https://github.com/Model-GLUE/Model-GLUE.

Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild

TL;DR

Model-GLUE tackles the challenge of democratized LLM scaling by benchmarking and orchestrating merging and mixture across a large, heterogeneous model zoo. It introduces a clustering-based selective merging pipeline and various model mixture configurations (FFN-, block-, and model-level) with router design choices and input strategies, including a hybrid approach. Empirical results on a Llama-2-based Which16 family show an average improvement of 5.61% over the best single model without extra training, and the framework demonstrates robustness across diverse benchmarks and model families (including Mistral). The work provides a practical recipe for assembling diverse LLM capabilities while highlighting limitations, energy considerations, and future directions such as model stacking and communication for even broader scalability.

Abstract

As Large Language Models (LLMs) excel across tasks and specialized domains, scaling LLMs based on existing models has garnered significant attention, which faces the challenge of decreasing performance when combining disparate models. Various techniques have been proposed for the aggregation of pre-trained LLMs, including model merging, Mixture-of-Experts, and stacking. Despite their merits, a comprehensive comparison and synergistic application of them to a diverse model zoo is yet to be adequately addressed. In light of this research gap, this paper introduces Model-GLUE, a holistic LLM scaling guideline. First, our work starts with a benchmarking of existing LLM scaling techniques, especially selective merging, and variants of mixture. Utilizing the insights from the benchmark results, we formulate an optimal strategy for the selection and aggregation of a heterogeneous model zoo characterizing different architectures and initialization.Our methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters through a model mixture. Finally, evidenced by our experiments on a diverse Llama-2-based model zoo, Model-GLUE shows an average performance enhancement of 5.61%, achieved without additional training. Codes are available at: https://github.com/Model-GLUE/Model-GLUE.
Paper Structure (71 sections, 4 figures, 19 tables, 7 algorithms)

This paper contains 71 sections, 4 figures, 19 tables, 7 algorithms.

Figures (4)

  • Figure 1: Overview of Model-GLUE, composing of (1) Model Clustering based on architecture and weight similarity; (2) Model Filtering and Searching for merging; (3) Model Merging within each cluster; (4) Model Level Mixture of merged models.
  • Figure 2: Pipeline for model merging, as well as an overview of merging methods and search strategies.
  • Figure 3: The overview and decision flow of three model mixture levels and their selection philosophy.
  • Figure 4: (a) Comparison between different Heuristic Strategies on Which12, Which8, Which4. (b) Comparison of different model merging methods in Evolutionary Strategy.