Table of Contents
Fetching ...

PolarQuant: Optimal Gaussian Weight Quantization via Hadamard Rotation for LLM Compression

Caio Vicentino

Abstract

We present PolarQuant, a post-training weight quantization method for large language models (LLMs) that exploits the distributional structure of neural network weights to achieve near-lossless compression. PolarQuant operates in three stages: (1) block-wise normalization to the unit hypersphere, (2) Walsh-Hadamard rotation to transform coordinates into approximately Gaussian random variables, and (3) quantization with centroids matched to the Gaussian distribution. Our ablation reveals that Hadamard rotation alone accounts for 98% of the quality improvement, reducing Qwen3.5-9B perplexity from 6.90 (absmax Q5) to 6.40 (Delta = +0.03 from FP16), making it practically lossless without any calibration data. Furthermore, PolarQuant functions as an effective preprocessing step for downstream INT4 quantizers: PolarQuant Q5 dequantized and re-quantized by torchao INT4 achieves perplexity 6.56 versus 6.68 for direct absmax INT4, while maintaining 43.1 tok/s throughput at 6.5 GB VRAM. Code and models are publicly available.

PolarQuant: Optimal Gaussian Weight Quantization via Hadamard Rotation for LLM Compression

Abstract

We present PolarQuant, a post-training weight quantization method for large language models (LLMs) that exploits the distributional structure of neural network weights to achieve near-lossless compression. PolarQuant operates in three stages: (1) block-wise normalization to the unit hypersphere, (2) Walsh-Hadamard rotation to transform coordinates into approximately Gaussian random variables, and (3) quantization with centroids matched to the Gaussian distribution. Our ablation reveals that Hadamard rotation alone accounts for 98% of the quality improvement, reducing Qwen3.5-9B perplexity from 6.90 (absmax Q5) to 6.40 (Delta = +0.03 from FP16), making it practically lossless without any calibration data. Furthermore, PolarQuant functions as an effective preprocessing step for downstream INT4 quantizers: PolarQuant Q5 dequantized and re-quantized by torchao INT4 achieves perplexity 6.56 versus 6.68 for direct absmax INT4, while maintaining 43.1 tok/s throughput at 6.5 GB VRAM. Code and models are publicly available.

Paper Structure

This paper contains 32 sections, 4 theorems, 9 equations, 7 tables, 2 algorithms.

Key Result

Proposition 3.2

Let $\hat{\mathbf{b}} \in \mathbb{R}^d$ be a random vector uniformly distributed on the unit sphere $\mathbb{S}^{d-1}$. Let $\mathbf{H}_d$ be the normalized Walsh--Hadamard matrix. Then for each coordinate $j$, the rotated element $\tilde{b}_j = (\mathbf{H}_d \hat{\mathbf{b}})_j$ satisfies:

Theorems & Definitions (8)

  • Definition 3.1: Normalized Walsh--Hadamard Matrix
  • Proposition 3.2: Gaussianity of Rotated Coordinates
  • proof : Proof sketch
  • Remark 3.3
  • Theorem 3.4: Lloyd--Max Optimality Conditions
  • Proposition 3.5: Symmetry
  • proof
  • Proposition 3.6: MSE Advantage over Absmax