Table of Contents
Fetching ...

Maximizing the Spectral Energy Gain in Sub-1-Bit LLMs via Latent Geometry Alignment

Banseok Lee, Youngmin Kim

TL;DR

LittleBit-2 is proposed, a framework employing Internal Latent Rotation and Joint Iterative Quantization (Joint-ITQ), which acts as a geometric preconditioner, aligning coherent latent distributions with the binary hypercube with zero inference overhead.

Abstract

We identify the Spectral Energy Gain in extreme model compression, where low-rank binary approximations outperform tiny-rank floating-point baselines for heavy-tailed spectra. However, prior attempts fail to realize this potential, trailing state-of-the-art 1-bit methods. We attribute this degradation to Latent Geometry Misalignment: standard singular vectors exhibit high coherence (spiky distribution), the worst-case geometry for binary quantization. To realize this gain, we propose LittleBit-2, a framework employing Internal Latent Rotation and Joint Iterative Quantization (Joint-ITQ). This approach acts as a geometric preconditioner, aligning coherent latent distributions with the binary hypercube with zero inference overhead. Empirically, LittleBit-2 establishes a new state-of-the-art in the sub-1-bit regime (1$\sim$0.1 bpp) on Llama-2 and Llama-3, matching the fidelity of leading 1-bit baselines.

Maximizing the Spectral Energy Gain in Sub-1-Bit LLMs via Latent Geometry Alignment

TL;DR

LittleBit-2 is proposed, a framework employing Internal Latent Rotation and Joint Iterative Quantization (Joint-ITQ), which acts as a geometric preconditioner, aligning coherent latent distributions with the binary hypercube with zero inference overhead.

Abstract

We identify the Spectral Energy Gain in extreme model compression, where low-rank binary approximations outperform tiny-rank floating-point baselines for heavy-tailed spectra. However, prior attempts fail to realize this potential, trailing state-of-the-art 1-bit methods. We attribute this degradation to Latent Geometry Misalignment: standard singular vectors exhibit high coherence (spiky distribution), the worst-case geometry for binary quantization. To realize this gain, we propose LittleBit-2, a framework employing Internal Latent Rotation and Joint Iterative Quantization (Joint-ITQ). This approach acts as a geometric preconditioner, aligning coherent latent distributions with the binary hypercube with zero inference overhead. Empirically, LittleBit-2 establishes a new state-of-the-art in the sub-1-bit regime (10.1 bpp) on Llama-2 and Llama-3, matching the fidelity of leading 1-bit baselines.
Paper Structure (55 sections, 3 theorems, 25 equations, 14 figures, 4 tables, 2 algorithms)

This paper contains 55 sections, 3 theorems, 25 equations, 14 figures, 4 tables, 2 algorithms.

Key Result

Proposition 4.1

Let the quantization noise be $\mathcal{E}_{\text{quant}}(r) = \int_{0}^{r} \Lambda \sigma(x)^2dx$ and truncation error be $\mathcal{E}_{\text{trunc}}(r) = \int_{r}^{\infty} \sigma(x)^2dx$, assuming an average distortion coefficient $\Lambda$. Strategy B (binary, rank $r_B$) outperforms Strategy A ( This implies a critical threshold $\gamma^*$; Strategy B is superior for heavy-tailed distributions

Figures (14)

  • Figure 1: Latent Geometry Alignment. (a) Standard singular vectors exhibit high coherence (spiky distribution), clustering along the axes. This creates a geometric mismatch with the binary quantization targets (black dots at ($\pm 1$, $\pm 1$)). (b) LittleBit-2 employs Internal Latent Rotation via Joint-ITQ. Acting as a geometric preconditioner, it rotates latent factors to align with binary hypercube diagonals. This minimizes quantization noise (min error) and maximizes the optimization margin (Bimodal distribution).
  • Figure 2: The LittleBit-2 Framework Pipeline. Starting from a truncated SVD of the pretrained weight $W$, LittleBit-2 (lower path) explicitly addresses the geometric misalignment. The factors $\hat{U}$ and $\hat{V}$ are concatenated and fed into the Joint-ITQ solver to optimize an orthogonal rotation $R$. This rotation is applied ($\times$) to $\hat{U}$ and $\hat{V}$, transforming the spiky latent distribution (blue histogram) into an aligned bimodal distribution (red histogram). Finally, Dual-SVID and QAT extract the FP16 scales ($h, l, g$) and learn the binary factors ($U_b, V_b^T$).
  • Figure 3: Latent Geometry Misalignment. We visualize the local distortion coefficient $\lambda$ across latent rows. Standard initialization (LB) suffers from geometric outliers (max $\lambda \approx 0.88$), which act as spikes that degrade the precision of the shared floating-point scales. LittleBit-2 effectively suppresses these outliers through internal rotation, reducing the peak distortion to $0.29$ and minimizing quantization noise.
  • Figure 4: Visualization of Latent Geometry Alignment. Histograms of latent factor $\hat{U}$ derived from the q_proj ($W_Q$) in the middle (15th) layer of Llama-2 7B. Applying Internal Rotation transforms the distribution into a Gaussian (Orange), effectively mitigating outliers.
  • Figure 5: Evolution of Latent Geometry via Joint-ITQ. Histograms of latent factors ($\hat{U}, \hat{V}$) from the Llama-2 7B 15th layer K projection (first two latent dimensions). (Left) Raw SVD factors exhibit high coherence, concentrating probability mass near the zero decision boundary while containing significant outliers. (Right) Joint-ITQ (Iter 50) transforms this into a bimodal distribution, explicitly aligning the geometry with the binary vertices $\{\pm 1\}$ to maximize the decision margin.
  • ...and 9 more figures

Theorems & Definitions (4)

  • Proposition 4.1: Spectral Break-Even Condition
  • Lemma 4.2: Distortion-Geometry Duality
  • Definition 4.3: Coordinate Incoherence
  • Theorem 4.4: Delocalization via Rotation