Table of Contents
Fetching ...

Layer Importance and Hallucination Analysis in Large Language Models via Enhanced Activation Variance-Sparsity

Zichen Song, Sitan Huang, Yuxin Wu, Zhongfeng Kang

TL;DR

This paper first explores layer importance using the Activation Variance-Sparsity Score (AVSS), then proposes an enhanced version tailored to assess hallucination propensity across layers (EAVSS), offering a comprehensive framework for both layer importance evaluation and hallucination mitigation in LLMs.

Abstract

Evaluating the importance of different layers in large language models (LLMs) is crucial for optimizing model performance and interpretability. This paper first explores layer importance using the Activation Variance-Sparsity Score (AVSS), which combines normalized activation variance and sparsity to quantify each layer's contribution to overall model performance. By ranking layers based on AVSS and pruning the least impactful 25\%, our experiments on tasks such as question answering, language modeling, and sentiment classification show that over 90\% of the original performance is retained, highlighting potential redundancies in LLM architectures. Building on AVSS, we propose an enhanced version tailored to assess hallucination propensity across layers (EAVSS). This improved approach introduces Hallucination-Specific Activation Variance (HSAV) and Hallucination-Specific Sparsity (HSS) metrics, allowing precise identification of hallucination-prone layers. By incorporating contrastive learning on these layers, we effectively mitigate hallucination generation, contributing to more robust and efficient LLMs(The maximum performance improvement is 12\%). Our results on the NQ, SciQ, TriviaQA, TruthfulQA, and WikiQA datasets demonstrate the efficacy of this method, offering a comprehensive framework for both layer importance evaluation and hallucination mitigation in LLMs.

Layer Importance and Hallucination Analysis in Large Language Models via Enhanced Activation Variance-Sparsity

TL;DR

This paper first explores layer importance using the Activation Variance-Sparsity Score (AVSS), then proposes an enhanced version tailored to assess hallucination propensity across layers (EAVSS), offering a comprehensive framework for both layer importance evaluation and hallucination mitigation in LLMs.

Abstract

Evaluating the importance of different layers in large language models (LLMs) is crucial for optimizing model performance and interpretability. This paper first explores layer importance using the Activation Variance-Sparsity Score (AVSS), which combines normalized activation variance and sparsity to quantify each layer's contribution to overall model performance. By ranking layers based on AVSS and pruning the least impactful 25\%, our experiments on tasks such as question answering, language modeling, and sentiment classification show that over 90\% of the original performance is retained, highlighting potential redundancies in LLM architectures. Building on AVSS, we propose an enhanced version tailored to assess hallucination propensity across layers (EAVSS). This improved approach introduces Hallucination-Specific Activation Variance (HSAV) and Hallucination-Specific Sparsity (HSS) metrics, allowing precise identification of hallucination-prone layers. By incorporating contrastive learning on these layers, we effectively mitigate hallucination generation, contributing to more robust and efficient LLMs(The maximum performance improvement is 12\%). Our results on the NQ, SciQ, TriviaQA, TruthfulQA, and WikiQA datasets demonstrate the efficacy of this method, offering a comprehensive framework for both layer importance evaluation and hallucination mitigation in LLMs.

Paper Structure

This paper contains 35 sections, 60 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Illustration of the Activation Variance-Sparsity Score (AVSS) method for assessing layer importance in large language models. (a) Layer Structure: Overview of model layers (1 to 32) analyzed for activation properties. (b)Activation Variance and Sparsity: Top: High-variance layers capture diverse information. Bottom: Darker cells indicate sparse activations, suggesting redundancy. (c) AVSS Calculation and Ranking: AVSS, normalized AVSS, and cumulative AVSS formulas are used to rank layers, identifying low-scoring layers as pruning candidates.
  • Figure 2: Comparison of layer deletion strategies based on AVSS and layer traversal. In subfigure (a), layers marked within the green box are identified for deletion using the AVSS (Activation Variance-Sparsity Score) method. Subfigure (b) shows the top six layers selected for deletion after exhaustively traversing each layer and ranking their importance, with the selected layers highlighted in the yellow box. Noticeable differences exist between the layers identified by AVSS and those from traversal, with AVSS-based layer selection achieving superior experimental performance.
  • Figure 3: Layer-wise performance comparison for five tasks (NQ, SciQ, TriviaQA, TruthfulQA, WikiQA) on the GPT-2 model. Each subplot shows the variation of four metrics (accuracy@50, coverage@50, ECE, and Brier score) across 24 layers. Distinct activation patterns highlight key layers crucial for task-specific processing and model reliability, guiding targeted hallucination mitigation based on layer importance.
  • Figure 4: Layer-wise Activation Variance, L1 Norm, and L2 Norm for LLaMa-3B on The Pile and HackerNews datasets (top two rows), and DistilBERT on SQuAD (bottom row).
  • Figure 5: Layer-wise Activation Variance, L1 Norm, L2 Norm, Frobenius Norm, and Activation Sparsity for DistilBERT on The Pile (top two rows) and HackerNews (bottom two rows).