Investigating Layer Importance in Large Language Models

Yang Zhang; Yanfei Dong; Kenji Kawaguchi

Investigating Layer Importance in Large Language Models

Yang Zhang, Yanfei Dong, Kenji Kawaguchi

TL;DR

The paper investigates how individual layers in large language models contribute to overall performance by extending Shapley value attribution to layers and pairing it with layer-wise ablation. It introduces an efficient proximity-based sampling method to estimate layer Shapley values and analyzes how removing specific layers impacts performance. A key finding is the existence of cornerstone layers—typically early in the network—whose removal can cause a collapse to random guessing, while non-cornerstone layers have relatively small effects. The study also compares FFN and MoE architectures, observing that MoE-based models can be less dependent on any single cornerstone, possibly due to regularization via sparse activation. The work advances mechanistic interpretability in LLMs and suggests directions for more interpretable and efficient model design.

Abstract

Large language models (LLMs) have gained increasing attention due to their prominent ability to understand and process texts. Nevertheless, LLMs largely remain opaque. The lack of understanding of LLMs has obstructed the deployment in safety-critical scenarios and hindered the development of better models. In this study, we advance the understanding of LLM by investigating the significance of individual layers in LLMs. We propose an efficient sampling method to faithfully evaluate the importance of layers using Shapley values, a widely used explanation framework in feature attribution and data valuation. In addition, we conduct layer ablation experiments to assess the performance degradation resulting from the exclusion of specific layers. Our findings reveal the existence of cornerstone layers, wherein certain early layers can exhibit a dominant contribution over others. Removing one cornerstone layer leads to a drastic collapse of the model performance, often reducing it to random guessing. Conversely, removing non-cornerstone layers results in only marginal performance changes. This study identifies cornerstone layers in LLMs and underscores their critical role for future research.

Investigating Layer Importance in Large Language Models

TL;DR

Abstract

Paper Structure (29 sections, 10 equations, 6 figures, 4 tables)

This paper contains 29 sections, 10 equations, 6 figures, 4 tables.

Introduction
Related Work
Analyse parts of LLMs:
Model probing:
Mechanistic interpretability:
Study intermediate representation:
Preliminaries
Layers in LLMs
Shapley Value
Estimate Layer Shapley
Early truncation:
Neighborhood sampling:
Complexity analysis:
Mechanistic Interpretation via Layer-wise Ablation
Experiments
...and 14 more sections

Figures (6)

Figure 1: Illustration of single-layer ablation. A layer is ablated by removing the block while keeping the skip connection across the layer. We choose to ablate layers we used for layer Shapley calculation, that are, attention layers and FFN layers. For Mixtral 8x7B, we ablate attention layers and MoE layers. More details can be found in Section \ref{['sec:layer_ablation']}.
Figure 2: Proportion of estimated layer Shapley values for each layer. We calculate the proportion of Shapley values for each layer relative to all layers in the model. The layers in the pie chart are arranged in ascending order according to their proximity to the model input, moving in an anti-clockwise direction starting from the top of the chart. The top 4 most contributing layers are captioned. Across all three models (rows) and six tasks (columns), we observe a disproportionately high contribution from a few layers, typically early layers. Additionally, these important layers account for a significant portion of the overall layer importance. For example, in Llama3 70B, the top 4 layers contribute $47.6\%$ to model performance, as indicated by Shapley values. More discussion in Section \ref{['sec:shapley_results']}. Attn refers to attention layers, FFN refers to fully connected layers, and MoE refers to Mixture-of-Expert layers.
Figure 3: Layer ablation result of Llama3 8B. X-axis shows the layer ID of the removed layer. Y-axis shows the accuracy after this layer is removed. Attention layers are colored in red, while FFN layers are colored in blue. Removing one cornerstone layer can cause the model performance to immediately drop to random guesses. More discussion in Section \ref{['sec:layer_ablation_results']}.
Figure 4: Layer ablation result of Llama3 70B. X-axis shows the layer ID of the removed layer. Y-axis shows the accuracy after this layer is removed. Attention layers are colored in red, while FFN layers are colored in blue. Similar to Llama3 8B, removing a single cornerstone layer causes the model's performance to degrade to the level of random guessing. More discussion in Section \ref{['sec:layer_ablation_results']}.
Figure 5: Layer ablation result of Mixtral 8x7B. X-axis shows the layer ID of the removed layer. Y-axis shows the accuracy after this layer is removed. Attention layers are colored in red, while MoE layers are colored in blue. Removing a single layer generally causes a decrease in model performance. However, even after ablating cornerstone layers, the performance of Mixtral 8x7B remains above random guessing, suggesting a more balanced contribution among the layers for LLMs with MoE layers instead of FFN layers. More discussion in Section \ref{['sec:layer_ablation_results']}.
...and 1 more figures

Investigating Layer Importance in Large Language Models

TL;DR

Abstract

Investigating Layer Importance in Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)