Table of Contents
Fetching ...

Contrastive Instruction Tuning

Tianyi Lorena Yan, Fei Wang, James Y. Huang, Wenxuan Zhou, Fan Yin, Aram Galstyan, Wenpeng Yin, Muhao Chen

TL;DR

This work tackles the lack of robustness in instruction-tuned LLMs to variations in user instructions. It introduces Contrastive Instruction Tuning (CoIN), which uses paraphrase-based positives and same-instruction, different-input negatives to align hidden representations of semantically equivalent instruction–instance pairs, optimizing a combined loss that pairs generation with a temperature-scaled contrastive term. Evaluated on PromptBench with seven perturbation types across ten GLUE tasks, CoIN yields an average accuracy improvement of $+2.5\%$ over continual instruction tuning and reduces output variance, with larger gains on paraphrase identification and grammar correctness. The approach is complemented by augmenting the FLAN dataset with 52k paraphrase-based entries, offering a practical boost to robustness without extra data or training steps beyond the contrastive objective, and suggesting broad applicability to robustness across modalities and prompts.

Abstract

Instruction tuning has been used as a promising approach to improve the performance of large language models (LLMs) on unseen tasks. However, current LLMs exhibit limited robustness to unseen instructions, generating inconsistent outputs when the same instruction is phrased with slightly varied forms or language styles. This behavior indicates LLMs' lack of robustness to textual variations and generalizability to unseen instructions, potentially leading to trustworthiness issues. Accordingly, we propose Contrastive Instruction Tuning, which maximizes the similarity between the hidden representations of semantically equivalent instruction-instance pairs while minimizing the similarity between semantically different ones. To facilitate this approach, we augment the existing FLAN collection by paraphrasing task instructions. Experiments on the PromptBench benchmark show that CoIN consistently improves LLMs' robustness to unseen instructions with variations across character, word, sentence, and semantic levels by an average of +2.5% in accuracy. Code is available at https://github.com/luka-group/CoIN.

Contrastive Instruction Tuning

TL;DR

This work tackles the lack of robustness in instruction-tuned LLMs to variations in user instructions. It introduces Contrastive Instruction Tuning (CoIN), which uses paraphrase-based positives and same-instruction, different-input negatives to align hidden representations of semantically equivalent instruction–instance pairs, optimizing a combined loss that pairs generation with a temperature-scaled contrastive term. Evaluated on PromptBench with seven perturbation types across ten GLUE tasks, CoIN yields an average accuracy improvement of over continual instruction tuning and reduces output variance, with larger gains on paraphrase identification and grammar correctness. The approach is complemented by augmenting the FLAN dataset with 52k paraphrase-based entries, offering a practical boost to robustness without extra data or training steps beyond the contrastive objective, and suggesting broad applicability to robustness across modalities and prompts.

Abstract

Instruction tuning has been used as a promising approach to improve the performance of large language models (LLMs) on unseen tasks. However, current LLMs exhibit limited robustness to unseen instructions, generating inconsistent outputs when the same instruction is phrased with slightly varied forms or language styles. This behavior indicates LLMs' lack of robustness to textual variations and generalizability to unseen instructions, potentially leading to trustworthiness issues. Accordingly, we propose Contrastive Instruction Tuning, which maximizes the similarity between the hidden representations of semantically equivalent instruction-instance pairs while minimizing the similarity between semantically different ones. To facilitate this approach, we augment the existing FLAN collection by paraphrasing task instructions. Experiments on the PromptBench benchmark show that CoIN consistently improves LLMs' robustness to unseen instructions with variations across character, word, sentence, and semantic levels by an average of +2.5% in accuracy. Code is available at https://github.com/luka-group/CoIN.
Paper Structure (19 sections, 3 equations, 5 figures, 5 tables)

This paper contains 19 sections, 3 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: An example from CoLA warstadt_neural_2019 shows that current LLMs like Alpaca may generate entirely different responses when presented with semantically equivalent but textually different instructions.
  • Figure 2: Illustration of Coin. A paraphrased instruction is used as the positive sample (green) given the same instance input and output. An instruction paired with different instance input and output is used as the negative sample (red). Cosine similarity between the hidden representations of original and paraphrased instruction-instance pairs is encouraged to be high, and vice versa for the paired negative samples. As we observe that the cosine similarity between the hidden representations of data from different tasks is already low liu_how_2023, we use the same instruction paired with different instance input and output as hard negative samples to provide more informative training signals.
  • Figure 3: Models' average accuracy (left) and standard deviation (right) across $10$ GLUE datasets, with each dataset having six unseen instructions with no perturbation (clean) or perturbation added at character, word, sentence, and semantic levels. Coin has consistent improvement in accuracy and decrease in standard deviation across all types of perturbation compared to the base model and continual instruction tuning. Coin obtains significant improvement in robustness against word, character, and sentence level perturbations.
  • Figure 4: UMAP mcinnes_umap_2020 visualization of the hidden representations of decoder's last output token from continually instruction-tuned model (left) and Coin (right). 300 data points are selected from CoLA warstadt_neural_2019 with no perturbations (clean) or perturbations added at different levels. Coin's representations of inputs with instruction variations are clustered closer to each other compared to the continually instruction-tuned model, especially inputs with perturbations at word, character, and sentence level.
  • Figure 5: Coin's performance by the maximum weight $\lambda$ assigned to the contrastive loss. Coin achieves the highest average accuracy at $\lambda=10^3$.