Table of Contents
Fetching ...

SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection

Liangxin Liu, Xuebo Liu, Derek F. Wong, Dongfang Li, Ziyi Wang, Baotian Hu, Min Zhang

TL;DR

SelectIT introduces an uncertainty-aware framework for selecting high-quality instruction-tuning data without extra resources, using token-, sentence-, and model-level self-reflection to score IT samples. By applying SelectIT to Alpaca-GPT4, the authors produce Selective Alpaca, a compact dataset that yields substantial performance gains across multiple foundation models and domains, including reasoning and multilingual tasks. The study shows that longer, more computation-intensive IT data can be particularly effective, and demonstrates robustness across models and datasets while offering improved efficiency in data selection. The work provides open-source tooling and insights into data curation, with potential to reduce resource costs while boosting LLM alignment and capability.

Abstract

Instruction tuning (IT) is crucial to tailoring large language models (LLMs) towards human-centric interactions. Recent advancements have shown that the careful selection of a small, high-quality subset of IT data can significantly enhance the performance of LLMs. Despite this, common approaches often rely on additional models or data, which increases costs and limits widespread adoption. In this work, we propose a novel approach, termed SelectIT, that capitalizes on the foundational capabilities of the LLM itself. Specifically, we exploit the intrinsic uncertainty present in LLMs to more effectively select high-quality IT data, without the need for extra resources. Furthermore, we introduce a curated IT dataset, the Selective Alpaca, created by applying SelectIT to the Alpaca-GPT4 dataset. Empirical results demonstrate that IT using Selective Alpaca leads to substantial model ability enhancement. The robustness of SelectIT has also been corroborated in various foundation models and domain-specific tasks. Our findings suggest that longer and more computationally intensive IT data may serve as superior sources of IT, offering valuable insights for future research in this area. Data, code, and scripts are freely available at https://github.com/Blue-Raincoat/SelectIT.

SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection

TL;DR

SelectIT introduces an uncertainty-aware framework for selecting high-quality instruction-tuning data without extra resources, using token-, sentence-, and model-level self-reflection to score IT samples. By applying SelectIT to Alpaca-GPT4, the authors produce Selective Alpaca, a compact dataset that yields substantial performance gains across multiple foundation models and domains, including reasoning and multilingual tasks. The study shows that longer, more computation-intensive IT data can be particularly effective, and demonstrates robustness across models and datasets while offering improved efficiency in data selection. The work provides open-source tooling and insights into data curation, with potential to reduce resource costs while boosting LLM alignment and capability.

Abstract

Instruction tuning (IT) is crucial to tailoring large language models (LLMs) towards human-centric interactions. Recent advancements have shown that the careful selection of a small, high-quality subset of IT data can significantly enhance the performance of LLMs. Despite this, common approaches often rely on additional models or data, which increases costs and limits widespread adoption. In this work, we propose a novel approach, termed SelectIT, that capitalizes on the foundational capabilities of the LLM itself. Specifically, we exploit the intrinsic uncertainty present in LLMs to more effectively select high-quality IT data, without the need for extra resources. Furthermore, we introduce a curated IT dataset, the Selective Alpaca, created by applying SelectIT to the Alpaca-GPT4 dataset. Empirical results demonstrate that IT using Selective Alpaca leads to substantial model ability enhancement. The robustness of SelectIT has also been corroborated in various foundation models and domain-specific tasks. Our findings suggest that longer and more computationally intensive IT data may serve as superior sources of IT, offering valuable insights for future research in this area. Data, code, and scripts are freely available at https://github.com/Blue-Raincoat/SelectIT.
Paper Structure (37 sections, 5 equations, 8 figures, 11 tables)

This paper contains 37 sections, 5 equations, 8 figures, 11 tables.

Figures (8)

  • Figure 1: Existing advanced data selection strategies rely heavily on external models or data; however, SelectIT effectively overcomes this limitation.
  • Figure 2: Overall framework of SelectIT. In Token-level Self-Reflection, we employ the foundation model to rate the IT data from 1 to $K$. In Sentence-level Self-Reflection, we leverage the uncertainty of varied prompts on LLMs to enhance the rating process. In Model-level Self-Reflection, we harness uncertainty among different LLMs to facilitate a collaborative decision-making process in selecting IT data. Finally, different levels of self-reflection are reasonably combined into SelectIT, which can effectively select high-quality IT data without relying on additional resources.
  • Figure 3: Comparison of LLM abilities with varying Alpaca proportions.
  • Figure 4: Instruction embeddings representations of different selection strategies. The red and blue points are representations of full Alpaca datasets and selected data respectively.
  • Figure 5: Left: The average length of samples. Right: The proportion of calculation type.
  • ...and 3 more figures