LoaQ: Layer-wise Output Approximation Quantization

Li Lin; Xiaojun Wan

LoaQ: Layer-wise Output Approximation Quantization

Li Lin, Xiaojun Wan

TL;DR

LoaQ addresses misalignment between traditional layer-wise PTQ objectives (weight or linear-output focus) and the full-model outputs in large language models. It introduces output-matching factors at linear-layer and sub-block levels, incorporating RMSNorm awareness to align the quantized outputs with the originals and deriving a simple, closed-form update compatible with existing pipelines. The method is validated on LLaMA and Qwen families, showing robust gains for both weight-only and weight-activation quantization, and benefits from integration with Hadamard transforms and NeUQI initialization. Overall, LoaQ demonstrates consistent improvements in quantization quality across tasks and settings, signaling a meaningful advance in layer-wise post-training quantization for large-scale models.

Abstract

A natural and intuitive idea in model quantization is to approximate each component's quantized output to match its original. Motivated by this idea, most layer-wise post-training quantization (PTQ) methods focus on weight approximation at the linear-layer level. As a result, this local objective often yields insufficient approximations and practical deviations from the guiding intuition. Recent work has improved the approximation of linear-layer outputs within the layer-wise PTQ framework, but such refinements remain inadequate for achieving alignment with the full-model output. Based on a deeper understanding of the structure of mainstream LLMs, we propose LoaQ, which incorporates output-matching factors when quantizing linear layers within the layer-wise PTQ framework. It better aligns with this intuition and can feature a simple closed-form solution, making it orthogonal to existing techniques and readily integrable into existing quantization pipelines. Experiments on the LLaMA and Qwen model families demonstrate that LoaQ performs effectively in both weight-only and weight-activation quantization. By integrating seamlessly with existing quantization strategies, it further enhances overall quantization quality and shows strong potential to advance the frontier of post-training quantization.

LoaQ: Layer-wise Output Approximation Quantization

TL;DR

Abstract

LoaQ: Layer-wise Output Approximation Quantization

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)