On Pruning State-Space LLMs
Tamer Ghattas, Michael Hassid, Roy Schwartz
TL;DR
This paper evaluates pruning for state-space model (SSM)–based LLMs, adapting both unstructured (WANDA) and structured pruning methods to the SSM components across four models and multiple tasks. It finds that unstructured pruning is generally robust, and state pruning can incur only small degradations in several cases, while head pruning severely degrades performance across models. The work highlights that output projections are particularly sensitive to pruning and that the choice of pruning method critically shapes efficiency and accuracy outcomes. Collectively, the results demonstrate a path toward practical, more efficient SSM-based LLMs, while underscoring the need for method-model alignment and further exploration of all components.
Abstract
Recent work proposed state-space models (SSMs) as an efficient alternative to transformer-based LLMs. Can these models be pruned to further reduce their computation costs? We adapt several pruning methods to the SSM structure, and apply them to four SSM-based LLMs across multiple tasks. We find that such models are quite robust to some pruning methods (e.g. WANDA), while using other methods lead to fast performance degradation.
