Table of Contents
Fetching ...

Task-Specific Data Selection for Instruction Tuning via Monosemantic Neuronal Activations

Da Ma, Gonghu Shang, Zhi Chen, Libo Qin, Yijie Luo, Lei Pan, Shuai Fan, Lu Chen, Kai Yu

TL;DR

MoNA tackles the data-selection bottleneck in task-specific instruction tuning by introducing a model-centric representation of samples through monosemantic neuronal activations. A sparse autoencoder disentangles polysemantic neuron activations into sparse, interpretable units, and a tailored generalized Jaccard similarity measures similarity to a task prototype formed from target exemplars. Across multiple datasets, models, and data ratios, MoNA consistently outperforms baselines in both stability and task-specific performance, demonstrating the value of aligning data selection with internal model dynamics. The approach offers a scalable, semantically expressive data curation mechanism with potential extensions to pretraining and multimodal data.

Abstract

Instruction tuning improves the ability of large language models (LLMs) to follow diverse human instructions, but achieving strong performance on specific target tasks remains challenging. A critical bottleneck is selecting the most relevant data to maximize task-specific performance. Existing data selection approaches include unstable influence-based methods and more stable distribution alignment methods, the latter of which critically rely on the underlying sample representation. In practice, most distribution alignment methods, from shallow features (e.g., BM25) to neural embeddings (e.g., BGE, LLM2Vec), may fail to capture how the model internally processes samples. To bridge this gap, we adopt a model-centric strategy in which each sample is represented by its neuronal activation pattern in the model, directly reflecting internal computation. However, directly using raw neuron activations leads to spurious similarity between unrelated samples due to neuron polysemanticity, where a single neuron may respond to multiple, unrelated concepts. To address this, we employ sparse autoencoders to disentangle polysemantic activations into sparse, monosemantic representations, and introduce a dedicated similarity metric for this space to better identify task-relevant data. Comprehensive experiments across multiple instruction datasets, models, tasks, and selection ratios show that our approach consistently outperforms existing data selection baselines in both stability and task-specific performance.

Task-Specific Data Selection for Instruction Tuning via Monosemantic Neuronal Activations

TL;DR

MoNA tackles the data-selection bottleneck in task-specific instruction tuning by introducing a model-centric representation of samples through monosemantic neuronal activations. A sparse autoencoder disentangles polysemantic neuron activations into sparse, interpretable units, and a tailored generalized Jaccard similarity measures similarity to a task prototype formed from target exemplars. Across multiple datasets, models, and data ratios, MoNA consistently outperforms baselines in both stability and task-specific performance, demonstrating the value of aligning data selection with internal model dynamics. The approach offers a scalable, semantically expressive data curation mechanism with potential extensions to pretraining and multimodal data.

Abstract

Instruction tuning improves the ability of large language models (LLMs) to follow diverse human instructions, but achieving strong performance on specific target tasks remains challenging. A critical bottleneck is selecting the most relevant data to maximize task-specific performance. Existing data selection approaches include unstable influence-based methods and more stable distribution alignment methods, the latter of which critically rely on the underlying sample representation. In practice, most distribution alignment methods, from shallow features (e.g., BM25) to neural embeddings (e.g., BGE, LLM2Vec), may fail to capture how the model internally processes samples. To bridge this gap, we adopt a model-centric strategy in which each sample is represented by its neuronal activation pattern in the model, directly reflecting internal computation. However, directly using raw neuron activations leads to spurious similarity between unrelated samples due to neuron polysemanticity, where a single neuron may respond to multiple, unrelated concepts. To address this, we employ sparse autoencoders to disentangle polysemantic activations into sparse, monosemantic representations, and introduce a dedicated similarity metric for this space to better identify task-relevant data. Comprehensive experiments across multiple instruction datasets, models, tasks, and selection ratios show that our approach consistently outperforms existing data selection baselines in both stability and task-specific performance.

Paper Structure

This paper contains 53 sections, 11 equations, 7 figures, 13 tables, 1 algorithm.

Figures (7)

  • Figure 1: (a) An example of neural coactivation in brain quiroga2005invariant (b) Disentangling polysemantic activations into monosemantic representations via a sparse autoencoder (SAE) (c) Improved data selection using monosemantic activations (More details are in Appendix \ref{['subsec:appendix_eval_poly']})
  • Figure 2: Workflow of MoNA. Left: Distribution alignment pipeline between the source dataset and the target task. Right: Computation of monosemantic neuronal activation embeddings and the proposed similarity metric. Top: Application of SAE; Bottom right: Aggregation of token embeddings into a sentence-level embedding; Bottom left: Calculation of similarity between two samples
  • Figure 3: Performance of different data selection methods under varying selection ratios, evaluated on Less with LLaMA3.1-8B.
  • Figure 4: Neuron activation profiles for $100$ Math and $100$ Code samples on the top-$100$ most variant neurons. Faint lines show individual samples; bold lines show task means. In the polysemantic (top) plot, many neurons, especially those with high activation peaks (marked by weeping face), are simultaneously activated by both tasks, reflecting pronounced overlap and limited task specificity. In contrast, the monosemantic (bottom) plot reveals clear task-specific activation patterns.
  • Figure 5: LLM as a Data Analyst: scores for data selected by different methods. Higher scores indicate better performance
  • ...and 2 more figures