Table of Contents
Fetching ...

Optimal Scalar Quantization for Matrix Multiplication: Closed-Form Density and Phase Transition

Calvin Ang, Sungyoon Kim, Mert Pilanci

Abstract

We study entrywise scalar quantization of two matrices prior to multiplication. Given $A\in R^{m\times k}$ and $B\in R^{k\times n}$, we quantize entries of $A$ and $B$ independently using scalar quantizers with $K_X$ and $K_Y$ levels per entry, and form $\widehat C=\widehat A\,\widehat B$. The objective is to minimize the matrix multiplication mean-squared error (MSE) $E[\|{AB-\widehat A\widehat B}\|_F^2]$ under a pair-i.i.d.\ inner-product model. In the high-resolution regime $K_X,K_Y\to\infty$, we derive a sharp $K^{-2}$ asymptotic expansion for $\mathcal{E}$, identify the exact optimal leading constants, and characterize asymptotically optimal quantization center densities in terms of conditional second moments. We then specialize to correlated Gaussian multiplicative pairs, obtaining a closed-form optimal point density \[ λ^\star(u)\ \propto\ \exp\!\left(-\frac{u^2}{6}\right)\bigl((1-ρ^2)+ρ^2u^2\bigr)^{1/3}, \qquad u=\frac{x}{σ_X}, \] with the same form for $y/σ_Y$, and prove a correlation-driven phase transition: the density is unimodal at the origin for $|ρ|\leq 1/\sqrt{3}$ and becomes bimodal for $|ρ|>1/\sqrt{3}$ with peaks at $u_{\mathrm{peak}}=\pm\sqrt{3-1/ρ^2}$. We show our method's applicability in synthetic experiments such as matrix multiplication quantization and least squares optimization, as well as quantization of large language model key and query activations.

Optimal Scalar Quantization for Matrix Multiplication: Closed-Form Density and Phase Transition

Abstract

We study entrywise scalar quantization of two matrices prior to multiplication. Given and , we quantize entries of and independently using scalar quantizers with and levels per entry, and form . The objective is to minimize the matrix multiplication mean-squared error (MSE) under a pair-i.i.d.\ inner-product model. In the high-resolution regime , we derive a sharp asymptotic expansion for , identify the exact optimal leading constants, and characterize asymptotically optimal quantization center densities in terms of conditional second moments. We then specialize to correlated Gaussian multiplicative pairs, obtaining a closed-form optimal point density with the same form for , and prove a correlation-driven phase transition: the density is unimodal at the origin for and becomes bimodal for with peaks at . We show our method's applicability in synthetic experiments such as matrix multiplication quantization and least squares optimization, as well as quantization of large language model key and query activations.
Paper Structure (32 sections, 17 theorems, 119 equations, 5 figures, 1 table)

This paper contains 32 sections, 17 theorems, 119 equations, 5 figures, 1 table.

Key Result

Theorem 1

Under Assumptions as:pair_iid--as:regularity, Moreover, asymptotically optimal $K_X$- and $K_Y$-level quantizers are companding quantizers with point densities

Figures (5)

  • Figure 1: Optimal density phase transition. When $\rho=0$, there is only a single mode. As $\rho$ increases, an additional mode emerges at the critical value $\left\lvert \rho \right\rvert=1/\sqrt{3}$.
  • Figure 2: Performance of our optimal quantizer vs. other commonly used quantizers.
  • Figure 3: Comparison of three different schemes for solving quantized least squares. The $y$ axis is the difference between ground truth and the quantized solution $\|W - W^{*}\|_F$.
  • Figure 4: Estimated $\rho$ value for each layer and head in GPT-2 Small
  • Figure 5: Tuned $|\rho|$ for GPT-small

Theorems & Definitions (30)

  • Theorem 1: High-rate optimal matrix multiplication MSE
  • proof
  • Corollary 1: Rate form and optimal bit split
  • Corollary 2: Closed-form asymptotically optimal point density
  • Theorem 2: Unimodal-to-bimodal transition at $\left\lvert \rho \right\rvert=1/\sqrt{3}$
  • Corollary 3: Gaussian high-rate constant for matrix multiplication MSE
  • Lemma 1: Cell width formula
  • proof
  • Lemma 2: Reproduction points are second-order close to cell midpoints
  • proof
  • ...and 20 more