SPIN: Sparsifying and Integrating Internal Neurons in Large Language Models for Text Classification
Difan Jiao, Yilun Liu, Zhenwei Tang, Daniel Matter, Jürgen Pfeffer, Ashton Anderson
TL;DR
SPIN targets the limitation of relying solely on terminal hidden states for text classification by harnessing internal representations from intermediate layers. It sparsifies per-layer neurons using linear probing to identify salient units and then integrates these salient features across layers to form rich multi-grained inputs for a lightweight classification head, all without updating the LLM weights. The approach is model-agnostic and demonstrates improvements in accuracy, training and inference efficiency, and interpretability, including compatibility with parameter-efficient fine-tuning. Empirical results across IMDb, SST-2, and EDOS show consistent gains over baselines, with competitive performance relative to fully fine-tuned models and clear advantages in speed and explainability, making SPIN a practical alternative for task-specific text classification with large language models.
Abstract
Among the many tasks that Large Language Models (LLMs) have revolutionized is text classification. Current text classification paradigms, however, rely solely on the output of the final layer in the LLM, with the rich information contained in internal neurons largely untapped. In this study, we present SPIN: a model-agnostic framework that sparsifies and integrates internal neurons of intermediate layers of LLMs for text classification. Specifically, SPIN sparsifies internal neurons by linear probing-based salient neuron selection layer by layer, avoiding noise from unrelated neurons and ensuring efficiency. The cross-layer salient neurons are then integrated to serve as multi-layered features for the classification head. Extensive experimental results show our proposed SPIN significantly improves text classification accuracy, efficiency, and interpretability.
