LATMiX: Learnable Affine Transformations for Microscaling Quantization of LLMs

Ofir Gordon; Lior Dikstein; Arnon Netzer; Idan Achituve; Hai Victor Habi

LATMiX: Learnable Affine Transformations for Microscaling Quantization of LLMs

Ofir Gordon, Lior Dikstein, Arnon Netzer, Idan Achituve, Hai Victor Habi

TL;DR

The paper tackles the challenge of post-training quantization for large language models under the microscaling MX format, where per-block scales create a block-structured quantization that amplifies activation outliers. It introduces LATMiX, which learns invertible affine transformations (global T1 and per-block T2) parameterized via LU or QR decompositions and optimized with a distillation loss plus a volume-preserving regularizer, folded into weight matrices to avoid runtime overhead. Theoretical analysis derives an MX-specific error bound that balances transform conditioning and block-level activation magnitudes, guiding the design of full affine transformations over block-diagonal approaches. Empirically, LATMiX yields consistent improvements in MX low-bit quantization across seven zero-shot benchmarks and WikiText2 perplexity, demonstrating practical impact for deploying accurate, low-resource LLMs.

Abstract

Post-training quantization (PTQ) is a widely used approach for reducing the memory and compute costs of large language models (LLMs). Recent studies have shown that applying invertible transformations to activations can significantly improve quantization robustness by reducing activation outliers; however, existing approaches are largely restricted to rotation or Hadamard-based transformations. Moreover, most studies focused primarily on traditional quantization schemes, whereas modern hardware increasingly supports the microscaling (MX) data format. Attempts to combine both showed severe performance degradation, leading prior work to introduce assumptions on the transformations. In this work, we take a complementary perspective. First, we provide a theoretical analysis of transformations under MX quantization by deriving a bound on the quantization error. Our analysis emphasizes the importance of accounting for both the activation distribution and the underlying quantization structure. Building on this analysis, we propose LATMiX, a method that generalizes outlier reduction to learnable invertible affine transformations optimized using standard deep learning tools. Experiments show consistent improvements in average accuracy for MX low-bit quantization over strong baselines on a wide range of zero-shot benchmarks, across multiple model sizes.

LATMiX: Learnable Affine Transformations for Microscaling Quantization of LLMs

TL;DR

Abstract

Paper Structure (27 sections, 4 theorems, 35 equations, 4 figures, 11 tables)

This paper contains 27 sections, 4 theorems, 35 equations, 4 figures, 11 tables.

Introduction
Background & Notation
Microscaling (MX) Quantization
Outlier Reduction Using Rotation Matrices
Method
Theoretical & Numerical Analysis
Learning General Affine Transformation
Related Work
Experiments
Experimental Settings
Zero-shot Reasoning Tasks Evaluation
Ablation & Analysis
Conclusion
MX Quantization Error Upper Bound (Theorem \ref{['thm:q_error_base']})
Affine Transformation inside MHA
...and 12 more sections

Key Result

Theorem 3.3

Assume that $\bm{x}$ is a continuous random vector, $\mathbf{T}$ is an affine transformation and $Q$ is the quantization of MX as defined in Eq. eq:q_mx. Then, under regularity assumptions on $\bm{x}$, Here, $f(x) \lesssim g(x)$ denotes that $f(x)$ is less than $g(x)$ up to a fixed multiplicative constant, and $\norm{\cdot}_{\sigma}$ denotes the spectral norm. Furthermore, if we assume that $\bm{

Figures (4)

Figure 1: LATMiX takes into account both the MX block structure and the distribution of features to diffuse outliers. In the figure, energy is distributed both within the block and among blocks to obtain lower quantization error.
Figure 2: Analysis of various transformation types: (1) Vanilla: no transformation applied; (2) Hadamard: Full Hadamard transform; (3) Block Hadamard: a block-diagonal matrix in which each block corresponds to an MX block with an Hadamard matrix; (4) a learned rotation matrix; and (5) a learned affine transformation that minimizes the objective in Eq. \ref{['eq:q_error']}. In Fig. \ref{['sfig:qe_vs_block_size']}, the Hadamard and learned rotation curves are superimposed.
Figure 3: LATMiX learns a transformation that spreads the energy across the tensor.
Figure 4: Location of all transformations on a regular LLM with marking of folding operations.

Theorems & Definitions (9)

Definition 3.1: General Affine Transformation
Definition 3.2: Transformation Mean Squared Error
Theorem 3.3: MX Quantization Error
Lemma 1.2
proof
Lemma 1.3
proof
Proposition 5.1
proof

LATMiX: Learnable Affine Transformations for Microscaling Quantization of LLMs

TL;DR

Abstract

LATMiX: Learnable Affine Transformations for Microscaling Quantization of LLMs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (9)