Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

Francesco Pio Monaco; Elia Cunegatti; Flavio Vella; Giovanni Iacca

Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

Francesco Pio Monaco, Elia Cunegatti, Flavio Vella, Giovanni Iacca

Abstract

Post-training model compression is essential for enhancing the portability of Large Language Models (LLMs) while preserving their performance. While several compression approaches have been proposed, less emphasis has been placed on selecting the most suitable set of data (the so-called \emph{calibration data}) for finding the compressed model configuration. The choice of calibration data is a critical step in preserving model capabilities both intra- and inter-tasks. In this work, we address the challenge of identifying high-performance calibration sets for both pruning and quantization by analyzing intrinsic data properties rather than model-specific signals. We introduce \texttt{\textbf{ZipCal}}, a model-agnostic data curation strategy that maximizes lexical diversity based on Zipfian power laws. Experiments demonstrate that our method consistently outperforms standard uniform random sampling across various pruning benchmarks. Notably, it also performs on par, in terms of downstream performance, with a state-of-the-art method that relies on model perplexity. The latter becomes prohibitively expensive at large-scale models and datasets, while \texttt{\textbf{ZipCal}} is on average $\sim$240$\times$ faster due to its tractable linear complexity\footnote{We make the code and the experiments available at https://anonymous.4open.science/r/zipcal-71CD/.}.

Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

Abstract

240

faster due to its tractable linear complexity\footnote{We make the code and the experiments available at https://anonymous.4open.science/r/zipcal-71CD/.}.

Paper Structure (29 sections, 2 theorems, 2 equations, 6 figures, 8 tables, 2 algorithms)

This paper contains 29 sections, 2 theorems, 2 equations, 6 figures, 8 tables, 2 algorithms.

Introduction
Data Curation Goals
Core Contributions
Related Work
Model Compression
Calibration Data
Zipf Sampling
Single-Domain Sampling
Multi-Domain Sampling
Experiments
Experimental Setup
Post-training Compressions
Experimental Environment
Baselines
Evaluation
...and 14 more sections

Key Result

Lemma 3.1

When ZipCal is used to extract a calibration set of $k$ samples on dataset $\mathcal{D}$ of $n$ elements, it completes the procedure in $O(nk)$ time.

Figures (6)

Figure 1: Token frequency distribution of the original datasets and the random, COLA, and ZipCal calibration sets.
Figure 2: Running time (log-scale) for calibration data selection. The whiteCOLA baseline is run for models of different sizes; whereas, whiteZipCal is model-agnostic, thus we report the measurement of the only run.
Figure 3: Effect of calibration data context length on model capabilities across compression techniques for LLaMA-3.1-8B-Instruct.
Figure 4: Effect of the number of calibration data samples on model capabilities across compression techniques for LLaMA-3.1-8B-Instruct.
Figure 5: Token frequency distribution of the original datasets and the random, COLA, and ZipCal sampling calibration sets using 16 samples.
...and 1 more figures

Theorems & Definitions (4)

Lemma 3.1
proof
Lemma 3.2
proof

Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

Abstract

Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

Authors

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (4)