R.I.P.: Better Models by Survival of the Fittest Prompts
Ping Yu, Weizhe Yuan, Olga Golovneva, Tianhao Wu, Sainbayar Sukhbaatar, Jason Weston, Jing Xu
TL;DR
Data quality is a key driver of instruction-following performance in LLMs. The authors introduce RIP, a data-filtering method that uses rejected-response quality and the reward gap between chosen and rejected responses to curate prompts, and Self-RIP to generate high-quality synthetic prompts. Across human-written and synthetic data, applied to Llama 3.1-8B-Instruct and Llama 3.3-70B-Instruct with DPO, RIP consistently outperforms baseline filtering methods on AlpacaEval2, Arena-Hard, and WildBench, with Self-RIP further improving results. The approach demonstrates strong generalization, reduces noisy prompts, and suggests potential safety and scalability benefits for future RLHF workflows.
Abstract
Training data quality is one of the most important drivers of final model quality. In this work, we introduce a method for evaluating data integrity based on the assumption that low-quality input prompts result in high variance and low quality responses. This is achieved by measuring the rejected response quality and the reward gap between the chosen and rejected preference pair. Our method, Rejecting Instruction Preferences (RIP) can be used to filter prompts from existing training sets, or to make high quality synthetic datasets, yielding large performance gains across various benchmarks compared to unfiltered data. Using Llama 3.1-8B-Instruct, RIP improves AlpacaEval2 LC Win Rate by 9.4%, Arena-Hard by 8.7%, and WildBench by 9.9%. Using Llama 3.3-70B-Instruct, RIP improves Arena-Hard from 67.5 to 82.9, which is from 18th place to 6th overall in the leaderboard.
