Table of Contents
Fetching ...

Pay More Attention to the Robustness of Prompt for Instruction Data Mining

Qiang Wang, Dawei Feng, Xu Zhang, Ao Shen, Yang Xu, Bo Ding, Huaimin Wang

TL;DR

This work addresses how prompt robustness affects the selection of high-quality instruction data for instruction tuning. It proposes a diamond data mining framework that leverages two self-guided methods, Adversarial Instruction-Following Difficulty (AIFD) and Adversarial Instruction Output Embedding Consistency (AIOEC), to mine online instruction data under adversarial prompt perturbations. The authors define new metrics and embedding-based measures, and validate them through extensive experiments on offline and online data with LLaMA-7B and LLaMA2-7B, showing consistent improvements over prior instruction data mining approaches. The findings highlight the practical importance of accounting for prompt robustness in instruction data mining to improve downstream instruction-tuning performance.

Abstract

Instruction tuning has emerged as a paramount method for tailoring the behaviors of LLMs. Recent work has unveiled the potential for LLMs to achieve high performance through fine-tuning with a limited quantity of high-quality instruction data. Building upon this approach, we further explore the impact of prompt's robustness on the selection of high-quality instruction data. This paper proposes a pioneering framework of high-quality online instruction data mining for instruction tuning, focusing on the impact of prompt's robustness on the data mining process. Our notable innovation, is to generate the adversarial instruction data by conducting the attack for the prompt of online instruction data. Then, we introduce an Adversarial Instruction-Following Difficulty metric to measure how much help the adversarial instruction data can provide to the generation of the corresponding response. Apart from it, we propose a novel Adversarial Instruction Output Embedding Consistency approach to select high-quality online instruction data. We conduct extensive experiments on two benchmark datasets to assess the performance. The experimental results serve to underscore the effectiveness of our proposed two methods. Moreover, the results underscore the critical practical significance of considering prompt's robustness.

Pay More Attention to the Robustness of Prompt for Instruction Data Mining

TL;DR

This work addresses how prompt robustness affects the selection of high-quality instruction data for instruction tuning. It proposes a diamond data mining framework that leverages two self-guided methods, Adversarial Instruction-Following Difficulty (AIFD) and Adversarial Instruction Output Embedding Consistency (AIOEC), to mine online instruction data under adversarial prompt perturbations. The authors define new metrics and embedding-based measures, and validate them through extensive experiments on offline and online data with LLaMA-7B and LLaMA2-7B, showing consistent improvements over prior instruction data mining approaches. The findings highlight the practical importance of accounting for prompt robustness in instruction data mining to improve downstream instruction-tuning performance.

Abstract

Instruction tuning has emerged as a paramount method for tailoring the behaviors of LLMs. Recent work has unveiled the potential for LLMs to achieve high performance through fine-tuning with a limited quantity of high-quality instruction data. Building upon this approach, we further explore the impact of prompt's robustness on the selection of high-quality instruction data. This paper proposes a pioneering framework of high-quality online instruction data mining for instruction tuning, focusing on the impact of prompt's robustness on the data mining process. Our notable innovation, is to generate the adversarial instruction data by conducting the attack for the prompt of online instruction data. Then, we introduce an Adversarial Instruction-Following Difficulty metric to measure how much help the adversarial instruction data can provide to the generation of the corresponding response. Apart from it, we propose a novel Adversarial Instruction Output Embedding Consistency approach to select high-quality online instruction data. We conduct extensive experiments on two benchmark datasets to assess the performance. The experimental results serve to underscore the effectiveness of our proposed two methods. Moreover, the results underscore the critical practical significance of considering prompt's robustness.

Paper Structure

This paper contains 18 sections, 5 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: The illustration of the overall framework of diamond samples mining from online instruction data to conduct the instruction tuning.
  • Figure 2: The illustration of Adversarial Instruction Output Embedding Consistency method for diamond data mining. And the method consists of three steps: generating adversarial instruction output embeddings, embeddings consistency measurement, and mining diamond data.
  • Figure 3: The comparison of performance for the fine-tuned LLaMa2-7B utilizing 10% and 20% WizarrdLM70K Datasets on the different tasks.
  • Figure 4: Visualization using t-SNE on instruction embeddings from the Alpaca dataset.