Table of Contents
Fetching ...

Diversity-driven Data Selection for Language Model Tuning through Sparse Autoencoder

Xianjun Yang, Shaoliang Nie, Lijuan Liu, Suchin Gururangan, Ujjwal Karn, Rui Hou, Madian Khabsa, Yuning Mao

TL;DR

This work tackles the challenge of measuring and leveraging data diversity for instruction tuning by introducing sparse autoencoders (SAEs) as a diversity signal. It develops two scalable data-selection algorithms, SAE-GreedSelect and SAE-SimScale, grounded in SAE activations to curate instruction-following data, and demonstrates that models fine-tuned on SAE-selected data surpass baselines on multiple datasets and model scales. The approach provides interpretability by linking longer instruction-response data to richer SAE features and shows strong performance while reducing data and compute requirements. The study further demonstrates robustness across base models and SAE configurations, offering a scalable path for industrial data pruning and data-centric AI practices.

Abstract

Instruction tuning data are often quantity-saturated due to the large volume of data collection and fast model iteration, leaving data selection important but underexplored. Existing quality-driven data selection methods, such as LIMA (NeurIPS 2023 \citep{zhou2024lima}) and AlpaGasus (ICLR 2024 \citep{chenalpagasus}) generally ignore the equal importance of data diversity and complexity. In this work, we aim to design a diversity-aware data selection strategy and creatively propose using sparse autoencoders (SAEs) to tackle the challenge of data diversity measure. In addition, SAEs can also provide more interpretability of model behavior and explain, e.g., the surprising effectiveness of selecting the longest response (ICML 2024 \citep{zhaolong}). Using effective data selection, we experimentally prove that models trained on our selected data can outperform other methods in terms of model capabilities, reduce training cost, and potentially gain more control over model behaviors. We prove that SAEs can serve as a good alternative to diversity measure and design our method to be scalable for potential industrial large-scale pruning, and we will also release our trained SAEs for use by the broader community.

Diversity-driven Data Selection for Language Model Tuning through Sparse Autoencoder

TL;DR

This work tackles the challenge of measuring and leveraging data diversity for instruction tuning by introducing sparse autoencoders (SAEs) as a diversity signal. It develops two scalable data-selection algorithms, SAE-GreedSelect and SAE-SimScale, grounded in SAE activations to curate instruction-following data, and demonstrates that models fine-tuned on SAE-selected data surpass baselines on multiple datasets and model scales. The approach provides interpretability by linking longer instruction-response data to richer SAE features and shows strong performance while reducing data and compute requirements. The study further demonstrates robustness across base models and SAE configurations, offering a scalable path for industrial data pruning and data-centric AI practices.

Abstract

Instruction tuning data are often quantity-saturated due to the large volume of data collection and fast model iteration, leaving data selection important but underexplored. Existing quality-driven data selection methods, such as LIMA (NeurIPS 2023 \citep{zhou2024lima}) and AlpaGasus (ICLR 2024 \citep{chenalpagasus}) generally ignore the equal importance of data diversity and complexity. In this work, we aim to design a diversity-aware data selection strategy and creatively propose using sparse autoencoders (SAEs) to tackle the challenge of data diversity measure. In addition, SAEs can also provide more interpretability of model behavior and explain, e.g., the surprising effectiveness of selecting the longest response (ICML 2024 \citep{zhaolong}). Using effective data selection, we experimentally prove that models trained on our selected data can outperform other methods in terms of model capabilities, reduce training cost, and potentially gain more control over model behaviors. We prove that SAEs can serve as a good alternative to diversity measure and design our method to be scalable for potential industrial large-scale pruning, and we will also release our trained SAEs for use by the broader community.

Paper Structure

This paper contains 22 sections, 3 equations, 12 figures, 6 tables, 1 algorithm.

Figures (12)

  • Figure 1: The training loss of TopK-SAE on the layer 31 of Llama-3.1-8b-instruct.
  • Figure 2: The correlation between text length and number of activations in SAEs.
  • Figure 3: The benchmark performance between different methods: Llama 2 (13B) trained from corresponding $1$k selected data from Alpaca.
  • Figure 4: The comparison of SAE-GreedSelect (Top) and SAE-SimScale (Bottom) under different SAE thresholds.
  • Figure 5: The head-to-head comparison between Llama-2-13b-base trained on different data selected from Alpaca.
  • ...and 7 more figures