Post-Training Statistical Calibration for Higher Activation Sparsity

Vui Seng Chua; Yujie Pan; Nilesh Jain

Post-Training Statistical Calibration for Higher Activation Sparsity

Vui Seng Chua, Yujie Pan, Nilesh Jain

TL;DR

This work tackles activation sparsity in large language models by moving beyond post-activation pruning on ReLU-based paths to a generalized, post-training scheme that prunes input activations to all FC layers in Transformer blocks. It introduces Statistical Calibrated Activation Pruning (SCAP), featuring Mode-Centering to align activation distributions for more effective $L_{1}$-thresholding and a unified SCAP_FC kernel that avoids sparsity predictors. Empirically, SCAP delivers a materially better Pareto frontier than CATS (e.g., up to $48.5\%$ FFN sparsity with only $-1.5\%$ accuracy loss) and up to $1.5\times$ decoding speedup across several model families, including Mistral-7B and Llama-2-7B, and extends to MoE, Mamba2, and Vision Transformers without retraining. The method is practical and scalable, with open-source code, enabling faster, more affordable deployment of large models on standard hardware.

Abstract

We present Statistical Calibrated Activation Pruning (SCAP), a post-training activation pruning framework that (1) generalizes sparsification by input activations of Fully-Connected layers for generic and flexible application across Transformers, and (2) features a simple Mode-Centering technique to pre-calibrate activation distributions for maximizing post-training sparsity. Our results demonstrate robust Pareto efficiency compared to prior methods, translating to a 1.5x additional LLM decoding speedup against CATS at iso model quality. SCAP effectiveness is empirically verified across a wide range of models, including recent Transformer Decoders, MoE, Mamba2, Encoding Transformer, and pre-quantized models, highlighting its practicality and scalability. The code is available at: https://github.com/IntelLabs/SCAP.

Post-Training Statistical Calibration for Higher Activation Sparsity

TL;DR

Abstract

Post-Training Statistical Calibration for Higher Activation Sparsity

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)