Task-Specific Data Selection for Instruction Tuning via Monosemantic Neuronal Activations

Da Ma; Gonghu Shang; Zhi Chen; Libo Qin; Yijie Luo; Lei Pan; Shuai Fan; Lu Chen; Kai Yu

Task-Specific Data Selection for Instruction Tuning via Monosemantic Neuronal Activations

Da Ma, Gonghu Shang, Zhi Chen, Libo Qin, Yijie Luo, Lei Pan, Shuai Fan, Lu Chen, Kai Yu

TL;DR

MoNA tackles the data-selection bottleneck in task-specific instruction tuning by introducing a model-centric representation of samples through monosemantic neuronal activations. A sparse autoencoder disentangles polysemantic neuron activations into sparse, interpretable units, and a tailored generalized Jaccard similarity measures similarity to a task prototype formed from target exemplars. Across multiple datasets, models, and data ratios, MoNA consistently outperforms baselines in both stability and task-specific performance, demonstrating the value of aligning data selection with internal model dynamics. The approach offers a scalable, semantically expressive data curation mechanism with potential extensions to pretraining and multimodal data.

Abstract

Instruction tuning improves the ability of large language models (LLMs) to follow diverse human instructions, but achieving strong performance on specific target tasks remains challenging. A critical bottleneck is selecting the most relevant data to maximize task-specific performance. Existing data selection approaches include unstable influence-based methods and more stable distribution alignment methods, the latter of which critically rely on the underlying sample representation. In practice, most distribution alignment methods, from shallow features (e.g., BM25) to neural embeddings (e.g., BGE, LLM2Vec), may fail to capture how the model internally processes samples. To bridge this gap, we adopt a model-centric strategy in which each sample is represented by its neuronal activation pattern in the model, directly reflecting internal computation. However, directly using raw neuron activations leads to spurious similarity between unrelated samples due to neuron polysemanticity, where a single neuron may respond to multiple, unrelated concepts. To address this, we employ sparse autoencoders to disentangle polysemantic activations into sparse, monosemantic representations, and introduce a dedicated similarity metric for this space to better identify task-relevant data. Comprehensive experiments across multiple instruction datasets, models, tasks, and selection ratios show that our approach consistently outperforms existing data selection baselines in both stability and task-specific performance.

Task-Specific Data Selection for Instruction Tuning via Monosemantic Neuronal Activations

TL;DR

Abstract

Task-Specific Data Selection for Instruction Tuning via Monosemantic Neuronal Activations

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)