Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Wenhua Cheng; Weiwei Zhang; Haihao Shen; Yiyang Cai; Xin He; Kaokao Lv; Yi Liu

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Wenhua Cheng, Weiwei Zhang, Haihao Shen, Yiyang Cai, Xin He, Kaokao Lv, Yi Liu

TL;DR

The paper addresses deployment challenges of large language models by proposing SignRound, a weight-only quantization method that uses Signed Gradient Descent to optimize rounding offsets and weight clipping, blending QAT and PTQ with a lightweight 200-step process. SignRound introduces trainable parameters for rounding and clipping and uses block-wise reconstruction to minimize a Frobenius-norm objective, enabling efficient, low-overhead inference. Across 7B–70B models, it achieves strong results in 2–4 bit quantization and shows near-lossless performance at 4 bits with model-specific hyperparameter tuning, while maintaining generalization to new models. The authors provide public code and demonstrate SignRound's superior speed and accuracy relative to state-of-the-art rounding methods and weight-only quantization baselines.

Abstract

Large Language Models (LLMs) have demonstrated exceptional proficiency in language-related tasks, but their deployment poses significant challenges due to substantial memory and storage requirements. Weight-only quantization has emerged as a promising solution, significantly reducing memory and storage needs without sacrificing too much performance. In this study, we introduce SignRound, a method that leverages signed gradient descent (SignSGD) to optimize rounding values and weight clipping in just 200 steps. SignRound integrates the advantages of Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ), delivering exceptional results across 2 to 4 bits while minimizing tuning costs and avoiding additional inference overhead. For example, SignRound achieved absolute average accuracy improvements ranging from 6.91% to 33.22% at 2bits, as measured by the average zero-shot accuracy across 11 tasks. It also demonstrates strong generalization in recent models, achieving near-lossless 4-bit quantization in most scenarios. The source code is publicly available at https://github.com/intel/auto-round.

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

TL;DR

Abstract

Paper Structure (28 sections, 6 equations, 8 figures, 14 tables, 1 algorithm)

This paper contains 28 sections, 6 equations, 8 figures, 14 tables, 1 algorithm.

Introduction
Related Work
Quantization Aware Training.
Post-training Quantization (PTQ).
Large Language Models Quantization.
Weight Only Quantization.
Rounding Methods.
Signed Gradient Descent.
Methodology
Experiments
Experimental Settings
Evaluation and Tasks.
Quantization Configurations.
Large Language Models.
SignRound Hyperparameters.
...and 13 more sections

Figures (8)

Figure 1: An illustration of SignRound. Unlike the direct rounding in RTN, SignRound performs signed gradient descent to fine-tune the rounding and weight clipping through block-wise output reconstruction. After lightweight forward and backward steps, $\textbf{W}_{\text{INT4}}$ has been well optimized. Note that Quant and Dequant are two standard operations for quantization and dequantization respectively.
Figure : Mistral-7B, alpha values
Figure : Mistral-7B, alpha values
Figure : Llama-2-7B, alpha values
Figure : Mistral-7B, beta values
...and 3 more figures

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

TL;DR

Abstract

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Authors

TL;DR

Abstract

Table of Contents

Figures (8)