Effective Model Pruning
Yixuan Wang, Dan Guralnik, Saiedeh Akbari, Warren Dixon
TL;DR
Effective Model Pruning (EMP) develops a universal, context-agnostic pruning threshold that converts any score distribution into an adaptive budget $N_{ ext{eff}} = ig\lfloor 1/\sum_i \omega_i^2 \big\rfloor$ with $\\omega_i = |s_i|/\sum_j |s_j|$. By retaining the top $N_{ ext{eff}}$ entries, EMP achieves sparse models with performance close to dense baselines across MLPs, CNNs, Transformers/LLMs, and KAN, without architecture-specific budgets or tuning. The approach is bolstered by a tight lower bound on the preserved mass $s_{ ext{eff}}$ via simplex geometry and an upper bound on the loss change $\\epsilon$ for magnitude-based pruning, plus an efficient $O(N \log N)$ algorithm with a tunable $\\beta$ to meet hardware constraints. Empirically, EMP delivers robust pruning across FCs, CNNs, KAN, LLMs, and featurewise image pruning, achieving substantial sparsity with minimal performance degradation and often outperforming fixed-sparsity baselines when paired with magnitude criteria.
Abstract
We introduce Effective Model Pruning (EMP), a context-agnostic, parameter-free rule addressing a fundamental question about pruning: how many entries to keep. EMP does not prescribe how to score the parameters or prune the models; instead, it supplies a universal adaptive threshold that can be applied to any pruning criterion: weight magnitude, attention score, KAN importance score, or even feature-level signals such as image pixel, and used on structural parts or weights of the models. Given any score vector s, EMP maps s to a built-in effective number N_eff which is inspired by the Inverse Simpson index of contributors. Retaining the N_eff highest scoring entries and zeroing the remainder yields sparse models with performance comparable to the original dense networks across MLPs, CNNs, Transformers/LLMs, and KAN, in our experiments. By leveraging the geometry of the simplex, we derive a tight lower bound on the preserved mass s_eff (the sum of retained scores) over the corresponding ordered probability simplex associated with the score vector s. We further verify the effectiveness of N_eff by pruning the model with a scaled threshold \b{eta}*N_eff across a variety of criteria and models. Experiments suggest that the default \b{eta} = 1 yields a robust threshold for model pruning while \b{eta} not equal to 1 still serves as an optional adjustment to meet specific sparsity requirements.
