On the Impact of Calibration Data in Post-training Quantization and Pruning

Miles Williams; Nikolaos Aletras

On the Impact of Calibration Data in Post-training Quantization and Pruning

Miles Williams, Nikolaos Aletras

TL;DR

This paper presents the first extensive empirical study on the effect of calibration data upon LLM performance, and makes a series of recommendations for the effective use of calibration data in LLM quantization and pruning.

Abstract

Quantization and pruning form the foundation of compression for neural networks, enabling efficient inference for large language models (LLMs). Recently, various quantization and pruning techniques have demonstrated remarkable performance in a post-training setting. They rely upon calibration data, a small set of unlabeled examples that are used to generate layer activations. However, no prior work has systematically investigated how the calibration data impacts the effectiveness of model compression methods. In this paper, we present the first extensive empirical study on the effect of calibration data upon LLM performance. We trial a variety of quantization and pruning methods, datasets, tasks, and models. Surprisingly, we find substantial variations in downstream task performance, contrasting existing work that suggests a greater level of robustness to the calibration data. Finally, we make a series of recommendations for the effective use of calibration data in LLM quantization and pruning.

On the Impact of Calibration Data in Post-training Quantization and Pruning

TL;DR

Abstract

Paper Structure (32 sections, 6 figures, 13 tables)

This paper contains 32 sections, 6 figures, 13 tables.

Introduction
Related Work
Model Compression
Sampling Calibration Data
Evaluating Compressed Models
Methodology
Model Compression
Quantization.
Pruning.
Evaluation Tasks
Calibration Data Sources
Models
Implementation Details
Data Sampling
Results & Analysis
...and 17 more sections

Figures (6)

Figure 1: Post-training compression methods rely upon calibration data to generate layer activations.
Figure 2: Distribution of accuracy across ten calibration sets sampled from C4 for the LLaMA family of models.
Figure 3: The perplexity on WikiText (L) and mean zero-shot accuracy (R) for LLaMA-7B with each compression method. We present the mean value and standard deviation (shaded) across ten calibration sets sampled from C4.
Figure 4: The distribution of mean zero-shot accuracy across all calibration sets for every configuration.
Figure 5: Distribution of accuracy across ten calibration sets sampled from C4 for the Vicuna family of models.
...and 1 more figures

On the Impact of Calibration Data in Post-training Quantization and Pruning

TL;DR

Abstract

On the Impact of Calibration Data in Post-training Quantization and Pruning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)