Table of Contents
Fetching ...

Generalizing Verifiable Instruction Following

Valentina Pyatkin, Saumya Malik, Victoria Graf, Hamish Ivison, Shengyi Huang, Pradeep Dasigi, Nathan Lambert, Hannaneh Hajishirzi

TL;DR

This work tackles the challenge of generalizing precise instruction following by introducing IFBench, a diverse bench of unseen verifiable constraints, and IFTrain, a set of training constraints to broaden learning opportunities. It proposes RLVR with GRPO to train models to better satisfy multiple, verifiable constraints, and demonstrates substantial improvements over prior IFEval benchmarks across several model families. The study also analyzes data design, constraint variable ranges, and multi-turn instruction scenarios to understand when and how generalization occurs, while highlighting trade-offs between constraint adherence and task performance. Overall, IFBench reveals persistent generalization gaps, and RLVR with varied training signals offers a practical path to more robust constraint-following in real-world, multi-turn chat applications.

Abstract

A crucial factor for successful human and AI interaction is the ability of language models or chatbots to follow human instructions precisely. A common feature of instructions are output constraints like ``only answer with yes or no" or ``mention the word `abrakadabra' at least 3 times" that the user adds to craft a more useful answer. Even today's strongest models struggle with fulfilling such constraints. We find that most models strongly overfit on a small set of verifiable constraints from the benchmarks that test these abilities, a skill called precise instruction following, and are not able to generalize well to unseen output constraints. We introduce a new benchmark, IFBench, to evaluate precise instruction following generalization on 58 new, diverse, and challenging verifiable out-of-domain constraints. In addition, we perform an extensive analysis of how and on what data models can be trained to improve precise instruction following generalization. Specifically, we carefully design constraint verification modules and show that reinforcement learning with verifiable rewards (RLVR) significantly improves instruction following. In addition to IFBench, we release 29 additional new hand-annotated training constraints and verification functions, RLVR training prompts, and code.

Generalizing Verifiable Instruction Following

TL;DR

This work tackles the challenge of generalizing precise instruction following by introducing IFBench, a diverse bench of unseen verifiable constraints, and IFTrain, a set of training constraints to broaden learning opportunities. It proposes RLVR with GRPO to train models to better satisfy multiple, verifiable constraints, and demonstrates substantial improvements over prior IFEval benchmarks across several model families. The study also analyzes data design, constraint variable ranges, and multi-turn instruction scenarios to understand when and how generalization occurs, while highlighting trade-offs between constraint adherence and task performance. Overall, IFBench reveals persistent generalization gaps, and RLVR with varied training signals offers a practical path to more robust constraint-following in real-world, multi-turn chat applications.

Abstract

A crucial factor for successful human and AI interaction is the ability of language models or chatbots to follow human instructions precisely. A common feature of instructions are output constraints like ``only answer with yes or no" or ``mention the word `abrakadabra' at least 3 times" that the user adds to craft a more useful answer. Even today's strongest models struggle with fulfilling such constraints. We find that most models strongly overfit on a small set of verifiable constraints from the benchmarks that test these abilities, a skill called precise instruction following, and are not able to generalize well to unseen output constraints. We introduce a new benchmark, IFBench, to evaluate precise instruction following generalization on 58 new, diverse, and challenging verifiable out-of-domain constraints. In addition, we perform an extensive analysis of how and on what data models can be trained to improve precise instruction following generalization. Specifically, we carefully design constraint verification modules and show that reinforcement learning with verifiable rewards (RLVR) significantly improves instruction following. In addition to IFBench, we release 29 additional new hand-annotated training constraints and verification functions, RLVR training prompts, and code.

Paper Structure

This paper contains 33 sections, 2 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Model performance on IFEval and IFBench (single-turn). Left Models: out-of-the-box performance. Right Models: after IF-RLVR training. IFBench has either 1 or 2 constraints per instruction.
  • Figure 2: Training on 1 - 6 constraints per instruction. (Tülu-DPO policy)
  • Figure 3: Training on 10, 100, 500 and 1000 instances per constraint.
  • Figure 4: Training on IFTrain (ood) + n constraints (in-domain) from IFEval. (Tülu-DPO policy and Qwen2.5)
  • Figure 5: Experiments with variable ranges. (Tülu-DPO policy)
  • ...and 4 more figures