HAAN: A Holistic Approach for Accelerating Normalization Operations in Large Language Models

Tianfan Peng; Jiajun Qin; Tianhua Xia; Sai Qian Zhang

HAAN: A Holistic Approach for Accelerating Normalization Operations in Large Language Models

Tianfan Peng, Jiajun Qin, Tianhua Xia, Sai Qian Zhang

TL;DR

This work tackles the normalization bottleneck in large language models by introducing HAAN, a holistic algorithm-hardware co-design for accelerating normalization operations such as LayerNorm and RMSNorm. The key ideas are to exploit cross-layer correlations in input statistics to predict or skip variance computations, use input subsampling to reduce workload, and apply quantization to ease hardware cost, all implemented in a reconfigurable accelerator with an input statistics calculator, a square root inverter, and a normalization unit. Empirical results show HAAN achieves substantial hardware efficiency gains—power savings over 60% and latency reductions around 20%—while maintaining accuracy within about 1% of FP32 baselines across multiple LLMs and tasks. The approach is supported by detailed ablations, hardware evaluations on an FPGA platform, and comparisons to existing normalization accelerators, illustrating strong potential for improving end-to-end throughput in inference and training of large-scale transformers.

Abstract

Large language models (LLMs) have revolutionized natural language processing (NLP) tasks by achieving state-of-the-art performance across a range of benchmarks. Central to the success of these models is the integration of sophisticated architectural components aimed at improving training stability, convergence speed, and generalization capabilities. Among these components, normalization operation, such as layer normalization (LayerNorm), emerges as a pivotal technique, offering substantial benefits to the overall model performance. However, previous studies have indicated that normalization operations can substantially elevate processing latency and energy usage. In this work, we adopt the principles of algorithm and hardware co-design, introducing a holistic normalization accelerating method named HAAN. The evaluation results demonstrate that HAAN can achieve significantly better hardware performance compared to state-of-the-art solutions.

HAAN: A Holistic Approach for Accelerating Normalization Operations in Large Language Models

TL;DR

Abstract

HAAN: A Holistic Approach for Accelerating Normalization Operations in Large Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)