Beyond Outliers: A Data-Free Layer-wise Mixed-Precision Quantization Approach Driven by Numerical and Structural Dual-Sensitivity

Hengyuan Zhang; Xinrong Chen; Zunhai Su; Xiao Liang; Jing Xiong; Wendong Xu; He Xiao; Chaofan Tao; Wei Zhang; Ruobing Xie; Lei Jiang; Hayden Kwok-Hay So; Ngai Wong

Beyond Outliers: A Data-Free Layer-wise Mixed-Precision Quantization Approach Driven by Numerical and Structural Dual-Sensitivity

Hengyuan Zhang, Xinrong Chen, Zunhai Su, Xiao Liang, Jing Xiong, Wendong Xu, He Xiao, Chaofan Tao, Wei Zhang, Ruobing Xie, Lei Jiang, Hayden Kwok-Hay So, Ngai Wong

Abstract

Layer-wise mixed-precision quantization (LMPQ) enables effective compression under extreme low-bit settings by allocating higher precision to sensitive layers. However, existing methods typically treat all intra-layer weight modules uniformly and rely on a single numerical property when estimating sensitivity, overlooking their distinct operational roles and structural characteristics. To address this, we propose NSDS, a novel calibration-free LMPQ framework driven by Numerical and Structural Dual-Sensitivity. Specifically, it first mechanistically decomposes each layer into distinct operational roles and quantifies their sensitivity from both numerical and structural perspectives. These dual-aspect scores are then aggregated into a unified layer-wise metric through a robust aggregation scheme based on MAD-Sigmoid and Soft-OR to guide bit allocation. Extensive experiments demonstrate that NSDS consistently achieves superior performance compared to various baselines across diverse models and downstream tasks, without relying on any calibration data.

Beyond Outliers: A Data-Free Layer-wise Mixed-Precision Quantization Approach Driven by Numerical and Structural Dual-Sensitivity

Abstract

Paper Structure (57 sections, 31 equations, 10 figures, 3 tables, 1 algorithm)

This paper contains 57 sections, 31 equations, 10 figures, 3 tables, 1 algorithm.

Introduction
The NSDS Framework
Mechanistic View of the Layer
Multi-Head Attention (MHA).
Feed-Forward Network (FFN).
Prediction as a Sum of Component Outputs.
Estimating Numerical Vulnerability (NV) and Structural Expressiveness (SE)
Numerical Vulnerability (NV).
Structural Expressiveness (SE).
Score Aggregation and Bit Allocation
Robust Normalization via MAD-Sigmoid.
Score Aggregation via Soft-OR.
Data-Free Layer-wise Bit Allocation.
Experiments
Experiment Settings
...and 42 more sections

Figures (10)

Figure 1: Layer-wise sensitivity across two LLMs. Each dot represents a layer. Darker dots indicate more severe perplexity degradation ($\Delta$PPL) when quantizing that layer. See §\ref{['subsec:estimating_sensitivity']} for the calculation of Numerical Vulnerability and Structural Expressiveness. Red dashed boxes highlight that layers with low numerical but high structural sensitivity still suffer significant degradation. See Appendix \ref{['app:nsds_scores']} for more results.
Figure 2: Overview of our NSDS framework for data-free layer sensitivity estimation. The left panel illustrates the mechanistic decomposition of a layer into distinct operational components, categorized as Detectors and Writers (§\ref{['subsec:mechanistic_view']}). The middle panel presents the dual-view sensitivity estimation: Numerical Vulnerability (NV) and Structural Expressiveness (SE) (§\ref{['subsec:estimating_sensitivity']}). The right panel shows the robust aggregation process, where MAD-Sigmoid normalizes heterogeneous scores and Soft-OR integrates them into a unified layer-wise sensitivity metric, which is then used for bit allocation under a target budget (§\ref{['subsec:aggregation']}). See Appendix \ref{['app:nsds_algo']} for the algorithmic description of the NSDS framework.
Figure 3: Average accuracy of NSDS and baselines on language reasoning benchmarks across different bit budgets for Llama-3.1-8B and Qwen2.5-7B.
Figure 4: Average accuracy of the ablation analysis on NSDS. "w/o" denotes the exclusion of a specific component from NSDS during layer sensitivity estimation.
Figure 5: Average accuracy comparison between NSDS framework and calibration-based baselines across generic reasoning benchmarks.
...and 5 more figures

Beyond Outliers: A Data-Free Layer-wise Mixed-Precision Quantization Approach Driven by Numerical and Structural Dual-Sensitivity

Abstract

Beyond Outliers: A Data-Free Layer-wise Mixed-Precision Quantization Approach Driven by Numerical and Structural Dual-Sensitivity

Authors

Abstract

Table of Contents

Figures (10)