Table of Contents
Fetching ...

Configurable Foundation Models: Building LLMs from a Modular Perspective

Chaojun Xiao, Zhengyan Zhang, Chenyang Song, Dazhi Jiang, Feng Yao, Xu Han, Xiaozhi Wang, Shuo Wang, Yufei Huang, Guanyu Lin, Yingfa Chen, Weilin Zhao, Yuge Tu, Zexuan Zhong, Ao Zhang, Chenglei Si, Khai Hao Moo, Chenyang Zhao, Huimin Chen, Yankai Lin, Zhiyuan Liu, Jingbo Shang, Maosong Sun

TL;DR

This work reframes large language models as configurable foundation models built from modular bricks, distinguishing emergent bricks formed during pre-training from customized bricks added post-training. It formalizes four brick-oriented operations (routing, merging, updating, growing) and provides empirical evidence that FFN layers exhibit activation sparsity and functional specialization, supporting modular design. The paper surveys emergent and customized bricks (including task, knowledge, and modality bricks) and analyzes brick granularity from solitary neurons to full models, arguing for hierarchical, reusable, and traceable configurations. It also discusses practical challenges and open directions, such as efficient brick construction, evaluation, and multi-model cooperation, aiming to drive scalable, efficient foundation-model architectures.

Abstract

Advancements in LLMs have recently unveiled challenges tied to computational efficiency and continual scalability due to their requirements of huge parameters, making the applications and evolution of these models on devices with limited computation resources and scenarios requiring various abilities increasingly cumbersome. Inspired by modularity within the human brain, there is a growing tendency to decompose LLMs into numerous functional modules, allowing for inference with part of modules and dynamic assembly of modules to tackle complex tasks, such as mixture-of-experts. To highlight the inherent efficiency and composability of the modular approach, we coin the term brick to represent each functional module, designating the modularized structure as configurable foundation models. In this paper, we offer a comprehensive overview and investigation of the construction, utilization, and limitation of configurable foundation models. We first formalize modules into emergent bricks - functional neuron partitions that emerge during the pre-training phase, and customized bricks - bricks constructed via additional post-training to improve the capabilities and knowledge of LLMs. Based on diverse functional bricks, we further present four brick-oriented operations: retrieval and routing, merging, updating, and growing. These operations allow for dynamic configuration of LLMs based on instructions to handle complex tasks. To verify our perspective, we conduct an empirical analysis on widely-used LLMs. We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions. Finally, we highlight several open issues and directions for future research. Overall, this paper aims to offer a fresh modular perspective on existing LLM research and inspire the future creation of more efficient and scalable foundational models.

Configurable Foundation Models: Building LLMs from a Modular Perspective

TL;DR

This work reframes large language models as configurable foundation models built from modular bricks, distinguishing emergent bricks formed during pre-training from customized bricks added post-training. It formalizes four brick-oriented operations (routing, merging, updating, growing) and provides empirical evidence that FFN layers exhibit activation sparsity and functional specialization, supporting modular design. The paper surveys emergent and customized bricks (including task, knowledge, and modality bricks) and analyzes brick granularity from solitary neurons to full models, arguing for hierarchical, reusable, and traceable configurations. It also discusses practical challenges and open directions, such as efficient brick construction, evaluation, and multi-model cooperation, aiming to drive scalable, efficient foundation-model architectures.

Abstract

Advancements in LLMs have recently unveiled challenges tied to computational efficiency and continual scalability due to their requirements of huge parameters, making the applications and evolution of these models on devices with limited computation resources and scenarios requiring various abilities increasingly cumbersome. Inspired by modularity within the human brain, there is a growing tendency to decompose LLMs into numerous functional modules, allowing for inference with part of modules and dynamic assembly of modules to tackle complex tasks, such as mixture-of-experts. To highlight the inherent efficiency and composability of the modular approach, we coin the term brick to represent each functional module, designating the modularized structure as configurable foundation models. In this paper, we offer a comprehensive overview and investigation of the construction, utilization, and limitation of configurable foundation models. We first formalize modules into emergent bricks - functional neuron partitions that emerge during the pre-training phase, and customized bricks - bricks constructed via additional post-training to improve the capabilities and knowledge of LLMs. Based on diverse functional bricks, we further present four brick-oriented operations: retrieval and routing, merging, updating, and growing. These operations allow for dynamic configuration of LLMs based on instructions to handle complex tasks. To verify our perspective, we conduct an empirical analysis on widely-used LLMs. We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions. Finally, we highlight several open issues and directions for future research. Overall, this paper aims to offer a fresh modular perspective on existing LLM research and inspire the future creation of more efficient and scalable foundational models.
Paper Structure (53 sections, 2 equations, 10 figures, 3 tables)

This paper contains 53 sections, 2 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: The illustration of a configurable foundation model consisting of emergent and customized bricks. For a given instruction, we select and combine tiny bricks to build an efficient instruction-specific model with minimal performance loss.
  • Figure 2: The illustration for emergent bricks. In the randomly initialized, functional differentiation emerges following the pre-training phase. Neurons with similar functionalities can be aggregated to form small functional bricks. Here, the model is divided into two human-defined layer bricks, of which each is further subdivided into several self-organized expert bricks.
  • Figure 3: The illustration for three typical customized bricks, including task bricks, knowledge bricks, and modality bricks.
  • Figure 4: The illustration for brick router and retrieval. It is only necessary to retrieve a subset of bricks to participate in the computation for each instruction.
  • Figure 5: Two widely-used operations for brick combination. (a) Parameter weighted average performs an element-wise average of multiple bricks with the same structures. (b) Brick stitching sequentially concatenates bricks together for complex reasoning.
  • ...and 5 more figures