Table of Contents
Fetching ...

RSAVQ: Riemannian Sensitivity-Aware Vector Quantization for Large Language Models

Zukang Xu, Xing Hu, Qiang Wu, Dawei Yang

TL;DR

RSAVQ tackles the challenge of deploying ultra-large language models on resource-limited devices by introducing a geometry-aware vector quantization framework. It combines Error Direction Sensitivity Guidance (EDSG), which realigns quantization errors along the low-sensitivity directions given by the negative natural gradient $- ilde{\nabla}\mathcal{L} = -\mathbf{F}_W^{-1}\nabla\mathcal{L}$, with Weight Channel Sensitivity Guidance (WCSG), which allocates bits across channels using curvature measures derived from the Fisher Information Matrix. The method leverages Product Quantization with a geometry-informed objective, and uses a Kronecker-factor FIM approximation to compute per-channel sensitivity $I_c$ and derive a principled bit-allocation rule $b_c \propto \log_2 I_c$. Empirically, RSAVQ achieves state-of-the-art performance in 2-bit quantization across multiple LLaMA models, improving perplexity and zero-shot accuracy while delivering notable speedups and reduced memory, thereby enabling practical deployment in constrained environments. The work bridges information geometry and neural quantization, offering both theoretical insight and a practical pathway to efficient large-scale language models.

Abstract

Large language models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing tasks. However, their exponentially increasing parameters pose significant challenges for deployment on resource-constrained devices. Vector Quantization (VQ) shows great promise for low-bit quantization (e.g., 2 to 4 bits), but existing work faces two key challenges: unconstrained direction error and suboptimal bit allocation. In this paper, we propose RSAVQ, a novel VQ framework to enhance extremely low-bit quantization for LLMs. RSAVQ introduces two geometry-driven innovations that effectively mitigate above limitations: (1) Error Direction Sensitivity Guidance (EDSG), which leverages the Fisher Information Matrix (FIM)-induced Riemannian metric to project quantization errors onto low-sensitivity directions in the parameter space. Specifically, this projection is performed along the negative natural gradient direction, which effectively suppresses error expansion. (2) Weight Channel Sensitivity Guidance (WCSG) , which constructs a channel-wise sensitivity metric via FIM curvature analysis to dynamically guide bit resource allocation. The approach facilitates a globally optimal quantization solution within prescribed bit constraints. Experiments demonstrate that RSAVQ outperforms existing methods for LLMs. For example, in 2-bit quantization of LLaMA-3 8B, RSAVQ leads baselines like VPTQ and QuIP# by 0.4 in perplexity (PPL) and 1.5 in zero-shot accuracy. This work offers a practical solution for constrained environments and a theoretical bridge between information geometry and the quantization of neural networks, advancing efficient deep learning.

RSAVQ: Riemannian Sensitivity-Aware Vector Quantization for Large Language Models

TL;DR

RSAVQ tackles the challenge of deploying ultra-large language models on resource-limited devices by introducing a geometry-aware vector quantization framework. It combines Error Direction Sensitivity Guidance (EDSG), which realigns quantization errors along the low-sensitivity directions given by the negative natural gradient , with Weight Channel Sensitivity Guidance (WCSG), which allocates bits across channels using curvature measures derived from the Fisher Information Matrix. The method leverages Product Quantization with a geometry-informed objective, and uses a Kronecker-factor FIM approximation to compute per-channel sensitivity and derive a principled bit-allocation rule . Empirically, RSAVQ achieves state-of-the-art performance in 2-bit quantization across multiple LLaMA models, improving perplexity and zero-shot accuracy while delivering notable speedups and reduced memory, thereby enabling practical deployment in constrained environments. The work bridges information geometry and neural quantization, offering both theoretical insight and a practical pathway to efficient large-scale language models.

Abstract

Large language models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing tasks. However, their exponentially increasing parameters pose significant challenges for deployment on resource-constrained devices. Vector Quantization (VQ) shows great promise for low-bit quantization (e.g., 2 to 4 bits), but existing work faces two key challenges: unconstrained direction error and suboptimal bit allocation. In this paper, we propose RSAVQ, a novel VQ framework to enhance extremely low-bit quantization for LLMs. RSAVQ introduces two geometry-driven innovations that effectively mitigate above limitations: (1) Error Direction Sensitivity Guidance (EDSG), which leverages the Fisher Information Matrix (FIM)-induced Riemannian metric to project quantization errors onto low-sensitivity directions in the parameter space. Specifically, this projection is performed along the negative natural gradient direction, which effectively suppresses error expansion. (2) Weight Channel Sensitivity Guidance (WCSG) , which constructs a channel-wise sensitivity metric via FIM curvature analysis to dynamically guide bit resource allocation. The approach facilitates a globally optimal quantization solution within prescribed bit constraints. Experiments demonstrate that RSAVQ outperforms existing methods for LLMs. For example, in 2-bit quantization of LLaMA-3 8B, RSAVQ leads baselines like VPTQ and QuIP# by 0.4 in perplexity (PPL) and 1.5 in zero-shot accuracy. This work offers a practical solution for constrained environments and a theoretical bridge between information geometry and the quantization of neural networks, advancing efficient deep learning.

Paper Structure

This paper contains 41 sections, 38 equations, 8 figures, 13 tables, 1 algorithm.

Figures (8)

  • Figure 1: Schematic diagram of loss perturbation from isotropic weight perturbations: natural gradient direction yields maximum loss increment.
  • Figure 2: (a) Weight distribution of sampling patch in down-project layer of final block in LLaMA-3 8B model. (b) Change in loss after applying the same perturbation( + 0.05 per element) to the input channel of figure (a).
  • Figure 3: WikiText-2 PPL (left) and average zero-shot accuracy (right) for LLaMA-3 8B quantized at different bit-widths.
  • Figure 4: Clustering process of error projection along negative natural gradient direction.
  • Figure 5: Weight channel sensitivity analysis and comparison.
  • ...and 3 more figures