PILLOW: Enhancing Efficient Instruction Fine-tuning via Prompt Matching
Zhenting Qi, Xiaoyu Tan, Shaojie Shi, Chao Qu, Yinghui Xu, Yuan Qi
TL;DR
PiLLow tackles the high cost of supervised fine-tuning for large language models by combining LoRA with a discrimination-based, RL-driven prompt matching mechanism. It learns to select and prepend exemplars from a user-defined pool to user instructions, enabling strong instruction-following with modest compute on consumer-grade GPUs. The approach achieves SFT-like performance on standard instruction datasets (Alpaca and Dolly) across multiple model sizes, with larger models deriving greater gains, while maintaining efficiency. This work offers a practical path to accessible, high-quality instruction tuning in low-resource settings and opens avenues for further RL-based prompting research.
Abstract
Instruction fine-tuning has conventionally been employed to adapt Large Language Models (LLMs) to a variety of tasks. Nonetheless, this technique often necessitates substantial computational resources, making it impractical for deployment by individuals or small-scale entities. Recently, Low-Rank Adaptation (LoRA) has become a promising alternative, offering high capabilities on par with full tuning with reduced resource overhead. However, attaining satisfactory performance through the fine-tuning of LoRA is a non-trivial challenge. In this paper, we propose PILLOW, which aims to improve LoRA's performance by a discrimination-based prompting method, leveraging LLMs' In-Context Learning ability. PILLOW incorporates a matching network that selects prompts from a user-defined prompt pool, concatenates the selected prompts with the user instruction as input, and performs inference using the LoRA-fine-tuned LLMs. Trained with Reinforcement Learning, PILLOW exhibits commensurate performance on various evaluation metrics compared with typical instruction fine-tuning methods, utilizing only consumer-grade GPU resources and exhibiting a large reduction in computational costs.
