LaCo: Large Language Model Pruning via Layer Collapse

Yifei Yang; Zouying Cao; Hai Zhao

LaCo: Large Language Model Pruning via Layer Collapse

Yifei Yang, Zouying Cao, Hai Zhao

TL;DR

This work introduces Layer Collapse (LaCo), a layer-wise structured pruning method that jointly merges adjacent layers via Reserving-Differences-while-Seeking-Common (RDSC) and guides pruning with representation similarity on a small calibration set. By collapsing rear layers into earlier ones, LaCo achieves 30–50% pruning without changing the model’s architecture, often preserving over 80% of task performance across multiple LLMs and benchmarks, and it outperforms state-of-the-art structured pruners. The approach preserves internal dimensions, enables seamless deployment, and supports low-cost post-training inheritance of parameters, with demonstrated robustness across varying pruning ratios and model scales. These results suggest substantial redundancy in current LLMs and offer a practical, hardware-friendly path to smaller, faster models without retraining from scratch.

Abstract

Large language models (LLMs) based on transformer are witnessing a notable trend of size expansion, which brings considerable costs to both model training and inference. However, existing methods such as model quantization, knowledge distillation, and model pruning are constrained by various issues, including hardware support limitations, the need for extensive training, and alterations to the model internal structure. In this paper, we propose a concise layer-wise structured pruner called \textit{Layer Collapse (LaCo)}, in which rear model layers collapse into a prior layer, enabling a rapid reduction in model size while preserving the model structure. Comprehensive experiments show that our method maintains an average task performance of over 80\% at pruning ratios of 25-30\%, significantly outperforming existing state-of-the-art structured pruning methods. We also conduct post-training experiments to confirm that the \textit{LaCo} effectively inherits the parameters of the original model. Additionally, we perform ablation studies on various settings of \textit{LaCo}. Finally, we discuss our motivation from the perspective of layer-wise similarity and evaluate the performance of the pruned LLMs across various pruning ratios\footnote{\url{https://github.com/yangyifei729/LaCo}}.

LaCo: Large Language Model Pruning via Layer Collapse

TL;DR

Abstract

Paper Structure (41 sections, 1 equation, 5 figures, 25 tables, 1 algorithm)

This paper contains 41 sections, 1 equation, 5 figures, 25 tables, 1 algorithm.

Introduction
Method
Reserving-Differences-while-Seeking-Common Layer Merge
Layer Collapse
Complexity Analysis
Experiments
Models
Benchmarks
Baselines
Settings
Main Results
Comparison of Perplexity
Pruning Time
Memory Usage and Inference Speed
Further Analysis
...and 26 more sections

Figures (5)

Figure 1: An example of Reserving-Differences-while-Seeking-Common (RDSC) Layer Merge. In (a), we perform parameter differencing, which we regard as Reserving-Differences. In (b), we conduct parameter merging, which we interpret as Seeking-Common.
Figure 2: An illustration of Layer Collapse.
Figure 3: Loss curves for post-training.
Figure 4: The L2 similarity of corresponding matrices between adjacent layers.
Figure 5: The cosine similarity of layer representations.

LaCo: Large Language Model Pruning via Layer Collapse

TL;DR

Abstract

LaCo: Large Language Model Pruning via Layer Collapse

Authors

TL;DR

Abstract

Table of Contents

Figures (5)