Task-Stratified Knowledge Scaling Laws for Post-Training Quantized Large Language Models
Chenxi Zhou, Pengfei Cao, Jiang Li, Bohan Yu, Jinyu Ye, Jun Zhao, Kang Liu
TL;DR
The paper addresses how post-training quantization affects diverse knowledge capabilities in LLMs by introducing Task-Stratified Knowledge Scaling Laws that jointly model model size, bit-width, calibration data, and quantization granularity. It defines a three-tier knowledge taxonomy (KM, KA, KR) and proposes a four-factor power-law framework with task-specific exponents, normalized accuracy, and a linearizable fitting procedure. Empirical validation across 293 PTQ configurations on Qwen3 and Llama-3 demonstrates strong fits (Adj. R^2 close to 0.95) and reveals distinct sensitivities: KR is precision-critical, KA scales with model size, and KM relies on calibration data, with phase-transition behavior at 2-bit. The work provides a practical, knowledge-aware quantization design space to mitigate performance collapse in low-bit regimes and offers cross-architecture generalization evidence essential for deployment decisions.
Abstract
Post-Training Quantization (PTQ) is a critical strategy for efficient Large Language Models (LLMs) deployment. However, existing scaling laws primarily focus on general performance, overlooking crucial fine-grained factors and how quantization differentially impacts diverse knowledge capabilities. To address this, we establish Task-Stratified Knowledge Scaling Laws. By stratifying capabilities into memorization, application, and reasoning, we develop a framework that unifies model size, bit-width, and fine-grained factors: group size and calibration set size. Validated on 293 diverse PTQ configurations, our framework demonstrates strong fit and cross-architecture consistency. It reveals distinct sensitivities across knowledge capabilities: reasoning is precision-critical, application is scale-responsive, and memorization is calibration-sensitive. We highlight that in low-bit scenarios, optimizing these fine-grained factors is essential for preventing performance collapse. These findings provide an empirically-backed foundation for designing knowledge-aware quantization strategies.
