Table of Contents
Fetching ...

Surgical Feature-Space Decomposition of LLMs: Why, When and How?

Arnav Chavan, Nahush Lele, Deepak Gupta

TL;DR

This work introduces Surgical Feature-Space Decomposition (SFSD), a training-free, layerwise compression method for transformer-based LLMs that operates in the feature space to achieve low-rank representations. By contrasting weight-space and feature-space decompositions and employing a per-layer surgical rank search, SFSD demonstrates superior preservation of downstream commonsense reasoning and reduction of bias compared to pruning and baseline approaches. The method includes bias compensation for discarded eigenvectors, and empirical results on LLaMA-7B and Mistral-7B show notable efficiency (CPU-only decomposition) and performance gains at modest compression, with consistent bias mitigation. The study concludes with insights into when and how SFSD is advantageous, while acknowledging limitations in scale and the need for continued exploration of rank-search strategies.

Abstract

Low-rank approximations, of the weight and feature space can enhance the performance of deep learning models, whether in terms of improving generalization or reducing the latency of inference. However, there is no clear consensus yet on \emph{how}, \emph{when} and \emph{why} these approximations are helpful for large language models (LLMs). In this work, we empirically study the efficacy of weight and feature space decomposition in transformer-based LLMs. We demonstrate that surgical decomposition not only provides critical insights into the trade-off between compression and language modelling performance, but also sometimes enhances commonsense reasoning performance of LLMs. Our empirical analysis identifies specific network segments that intrinsically exhibit a low-rank structure. Furthermore, we extend our investigation to the implications of low-rank approximations on model bias. Overall, our findings offer a novel perspective on optimizing LLMs, presenting the low-rank approximation not only as a tool for performance enhancements, but also as a means to potentially rectify biases within these models. Our code is available at \href{https://github.com/nyunAI/SFSD-LLM}{GitHub}.

Surgical Feature-Space Decomposition of LLMs: Why, When and How?

TL;DR

This work introduces Surgical Feature-Space Decomposition (SFSD), a training-free, layerwise compression method for transformer-based LLMs that operates in the feature space to achieve low-rank representations. By contrasting weight-space and feature-space decompositions and employing a per-layer surgical rank search, SFSD demonstrates superior preservation of downstream commonsense reasoning and reduction of bias compared to pruning and baseline approaches. The method includes bias compensation for discarded eigenvectors, and empirical results on LLaMA-7B and Mistral-7B show notable efficiency (CPU-only decomposition) and performance gains at modest compression, with consistent bias mitigation. The study concludes with insights into when and how SFSD is advantageous, while acknowledging limitations in scale and the need for continued exploration of rank-search strategies.

Abstract

Low-rank approximations, of the weight and feature space can enhance the performance of deep learning models, whether in terms of improving generalization or reducing the latency of inference. However, there is no clear consensus yet on \emph{how}, \emph{when} and \emph{why} these approximations are helpful for large language models (LLMs). In this work, we empirically study the efficacy of weight and feature space decomposition in transformer-based LLMs. We demonstrate that surgical decomposition not only provides critical insights into the trade-off between compression and language modelling performance, but also sometimes enhances commonsense reasoning performance of LLMs. Our empirical analysis identifies specific network segments that intrinsically exhibit a low-rank structure. Furthermore, we extend our investigation to the implications of low-rank approximations on model bias. Overall, our findings offer a novel perspective on optimizing LLMs, presenting the low-rank approximation not only as a tool for performance enhancements, but also as a means to potentially rectify biases within these models. Our code is available at \href{https://github.com/nyunAI/SFSD-LLM}{GitHub}.
Paper Structure (21 sections, 11 equations, 3 figures, 4 tables, 1 algorithm)

This paper contains 21 sections, 11 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Surgical Feature Space Decomposition (SFSD) of LLaMA-7B and Mistral-7B models with task-spcific accuracy used as rank search metric. Horizontal lines indicate the performance of the baseline pre-trained model.
  • Figure 2: Surgical Feature Space Decomposition (SFSD) of LLaMA-7B and Mistral-7B models with perplexity used as the rank search metric. The decomposed models are evaluated at regular interval on commonsense reasoning tasks and average accuracy is reported.
  • Figure 3: Final Parametric Budget $\beta$ averaged across six commonsense reasoning tasks. 100% indicates an intact layer exactly similar to the pre-trained model. LLaMA-7B consists of a total of 32 modules; with each module having query (q), key (k), value (v), output (o), gate (g), up (u) and down (d) projection layers.