Investigating Layer Importance in Large Language Models
Yang Zhang, Yanfei Dong, Kenji Kawaguchi
TL;DR
The paper investigates how individual layers in large language models contribute to overall performance by extending Shapley value attribution to layers and pairing it with layer-wise ablation. It introduces an efficient proximity-based sampling method to estimate layer Shapley values and analyzes how removing specific layers impacts performance. A key finding is the existence of cornerstone layers—typically early in the network—whose removal can cause a collapse to random guessing, while non-cornerstone layers have relatively small effects. The study also compares FFN and MoE architectures, observing that MoE-based models can be less dependent on any single cornerstone, possibly due to regularization via sparse activation. The work advances mechanistic interpretability in LLMs and suggests directions for more interpretable and efficient model design.
Abstract
Large language models (LLMs) have gained increasing attention due to their prominent ability to understand and process texts. Nevertheless, LLMs largely remain opaque. The lack of understanding of LLMs has obstructed the deployment in safety-critical scenarios and hindered the development of better models. In this study, we advance the understanding of LLM by investigating the significance of individual layers in LLMs. We propose an efficient sampling method to faithfully evaluate the importance of layers using Shapley values, a widely used explanation framework in feature attribution and data valuation. In addition, we conduct layer ablation experiments to assess the performance degradation resulting from the exclusion of specific layers. Our findings reveal the existence of cornerstone layers, wherein certain early layers can exhibit a dominant contribution over others. Removing one cornerstone layer leads to a drastic collapse of the model performance, often reducing it to random guessing. Conversely, removing non-cornerstone layers results in only marginal performance changes. This study identifies cornerstone layers in LLMs and underscores their critical role for future research.
