HLQ: Fast and Efficient Backpropagation via Hadamard Low-rank Quantization

Seonggon Kim; Eunhyeok Park

HLQ: Fast and Efficient Backpropagation via Hadamard Low-rank Quantization

Seonggon Kim, Eunhyeok Park

TL;DR

HLQ tackles the high cost of backpropagation in training large models by selectively applying Hadamard-based quantization to activation gradients and Hadamard low-rank approximation to weight gradients, preserving forward accuracy. The method leverages $g_w = \frac{1}{B} \bar{g}_y^T \cdot \bar{x}$ and $g_x = g_y \cdot w$ to tailor updates, using 4-bit HQ for $g_x$ and low-rank, int8-accelerated processing for $g_w$, with an int4 activation-compression strategy (ACBP). Empirical results show HLQ achieves up to $2.5\times$ faster BP and up to $78.5\%$ memory reduction while maintaining competitive or superior accuracy across CNNs and ViTs in both training-from-scratch and fine-tuning settings, relative to baselines like LBP-WHT and LUQ. The work offers practical training-cost reductions for resource-constrained environments and suggests future extensions to large language models.

Abstract

With the rapid increase in model size and the growing importance of various fine-tuning applications, lightweight training has become crucial. Since the backward pass is twice as expensive as the forward pass, optimizing backpropagation is particularly important. However, modifications to this process can lead to suboptimal convergence, so training optimization should minimize perturbations, which is a highly challenging task. In this study, we introduce a novel optimization strategy called Hadamard Low-rank Quantization (HLQ), focusing on reducing the cost of backpropagation in convolutional and linear layers. We first analyze the sensitivity of gradient computation with respect to activation and weight, and judiciously design the HLQ pipeline to apply 4-bit Hadamard quantization to the activation gradient and Hadamard low-rank approximation to the weight gradient. This combination was found to be the best for maximizing benefits, and our extensive experiments demonstrate the outstanding performance of HLQ in both training from scratch and fine-tuning, achieving significant memory savings and acceleration on real GPUs with negligible quality degradation.

HLQ: Fast and Efficient Backpropagation via Hadamard Low-rank Quantization

TL;DR

and

to tailor updates, using 4-bit HQ for

and low-rank, int8-accelerated processing for

, with an int4 activation-compression strategy (ACBP). Empirical results show HLQ achieves up to

faster BP and up to

memory reduction while maintaining competitive or superior accuracy across CNNs and ViTs in both training-from-scratch and fine-tuning settings, relative to baselines like LBP-WHT and LUQ. The work offers practical training-cost reductions for resource-constrained environments and suggests future extensions to large language models.

Abstract

Paper Structure (19 sections, 3 equations, 3 figures, 5 tables)

This paper contains 19 sections, 3 equations, 3 figures, 5 tables.

Introduction
Related Work
Quantization for Training Optimization
Low-rank Approximation for Training Optimization
Preliminary
Backpropagation of Linear Layer
Hadamard Transform
Analysis on Hadamard Low-rank Approximation for Backpropagation
Limitation of LBP-WHT
Alternative Optimization Technique: Hadamard Quantization
Analysis on Hadamard Quantization for Backpropagation
HLQ: Hadamard Low-rank Quantization
Implementation Details
Experiments
Comparison to the Trained Model Quality with Optimization
...and 4 more sections

Figures (3)

Figure 1: Overview of the backward pass for (a) vanilla pass, (b) LBP-WHT, and (c) HLQ.
Figure 2: Histogram of output gradients (a) on value domain without Hadamard transform and (b) on frequency domain with Hadamard transform in ResNet-34 training on CIFAR-100.
Figure 3: The loss landscape visualization of ResNet-34 on CIFAR-100 with (a) naive int4 quantization, (b) int4 with HT, and (c) HLQ.

HLQ: Fast and Efficient Backpropagation via Hadamard Low-rank Quantization

TL;DR

Abstract

HLQ: Fast and Efficient Backpropagation via Hadamard Low-rank Quantization

Authors

TL;DR

Abstract

Table of Contents

Figures (3)