Table of Contents
Fetching ...

AVSS: Layer Importance Evaluation in Large Language Models via Activation Variance-Sparsity Analysis

Zichen Song, Yuxin Wu, Sitan Huang, Zhongfeng Kang

TL;DR

By identifying and removing approximately the lowest 25% of layers based on AVSS, this work achieves over 90% of original model performance across tasks such as question answering, language modeling, and sentiment classification, indicating that these layers may be non-essential.

Abstract

The evaluation of layer importance in deep learning has been an active area of research, with significant implications for model optimization and interpretability. Recently, large language models (LLMs) have gained prominence across various domains, yet limited studies have explored the functional importance and performance contributions of individual layers within LLMs, especially from the perspective of activation distribution. In this work, we propose the Activation Variance-Sparsity Score (AVSS), a novel metric combining normalized activation variance and sparsity to assess each layer's contribution to model performance. By identifying and removing approximately the lowest 25% of layers based on AVSS, we achieve over 90% of original model performance across tasks such as question answering, language modeling, and sentiment classification, indicating that these layers may be non-essential. Our approach provides a systematic method for identifying less critical layers, contributing to efficient large language model architectures.

AVSS: Layer Importance Evaluation in Large Language Models via Activation Variance-Sparsity Analysis

TL;DR

By identifying and removing approximately the lowest 25% of layers based on AVSS, this work achieves over 90% of original model performance across tasks such as question answering, language modeling, and sentiment classification, indicating that these layers may be non-essential.

Abstract

The evaluation of layer importance in deep learning has been an active area of research, with significant implications for model optimization and interpretability. Recently, large language models (LLMs) have gained prominence across various domains, yet limited studies have explored the functional importance and performance contributions of individual layers within LLMs, especially from the perspective of activation distribution. In this work, we propose the Activation Variance-Sparsity Score (AVSS), a novel metric combining normalized activation variance and sparsity to assess each layer's contribution to model performance. By identifying and removing approximately the lowest 25% of layers based on AVSS, we achieve over 90% of original model performance across tasks such as question answering, language modeling, and sentiment classification, indicating that these layers may be non-essential. Our approach provides a systematic method for identifying less critical layers, contributing to efficient large language model architectures.

Paper Structure

This paper contains 11 sections, 9 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: Illustration of the Activation Variance-Sparsity Score (AVSS) method for assessing layer importance in large language models. (a) Layer Structure: Overview of model layers (1 to 32) analyzed for activation properties. (b)Activation Variance and Sparsity: Top: High-variance layers capture diverse information. Bottom: Darker cells indicate sparse activations, suggesting redundancy. (c) AVSS Calculation and Ranking: AVSS, normalized AVSS, and cumulative AVSS formulas are used to rank layers, identifying low-scoring layers as pruning candidates.