Fine-Tuning on Noisy Instructions: Effects on Generalization and Performance
Ahmed Alajrami, Xingwei Tan, Nikolaos Aletras
TL;DR
The paper investigates whether fine-tuning base LLMs on perturbed instructions improves robustness to noisy user prompts. Using six perturbation strategies across four instruction-tuning datasets and three large benchmarks, the authors show that training with noisy instructions often boosts performance under noise and can even improve results on clean prompts, with CoT generally outperforming direct prompting. They conduct extensive experiments across Qwen and Llama models of multiple sizes, employing LoRA/QLoRA fine-tuning and reporting on safety/bias alongside standard benchmarks. The findings suggest that perturbations act as a regularizer and data-augmentation signal, broadening the model’s task representations and improving resilience in real-world, imperfect prompting scenarios; future work could explore adaptive perturbation strategies and language-specific effects.
Abstract
Instruction-tuning plays a vital role in enhancing the task-solving abilities of large language models (LLMs), improving their usability in generating helpful responses on various tasks. However, previous work has demonstrated that they are sensitive to minor variations in instruction phrasing. In this paper, we explore whether introducing perturbations in instruction-tuning data can enhance LLMs' resistance against noisy instructions. We focus on how instruction-tuning with perturbations, such as removing stop words or shuffling words, affects LLMs' performance on the original and perturbed versions of widely-used benchmarks (MMLU, BBH, GSM8K). We further assess learning dynamics and potential shifts in model behavior. Surprisingly, our results suggest that instruction-tuning on perturbed instructions can, in some cases, improve downstream performance. These findings highlight the importance of including perturbed instructions in instruction-tuning, which can make LLMs more resilient to noisy user inputs.
