Table of Contents
Fetching ...

Order Matters: Investigate the Position Bias in Multi-constraint Instruction Following

Jie Zeng, Qianyu He, Qingyu Ren, Jiaqing Liang, Yanghua Xiao, Weikang Zhou, Zeye Sun, Fei Yu

TL;DR

This work identifies and quantifies position bias in multi-constraint instruction following by introducing the Constraint Difficulty Distribution Index (CDDI). It develops a probing task that synthesizes diverse multi-constraint instructions and evaluates LLMs under single-round and multi-round inference, revealing consistent improvements when constraints are ordered from hard to easy. An explanation study using gradient-based attributions links attention patterns to constraint-order performance, showing a strong correlation between constraint-specific attention and accuracy. The results highlight a systematic, architecture-agnostic phenomenon with practical implications for prompt design and evaluation of LLMs in real-world, constraint-rich tasks.

Abstract

Real-world instructions with multiple constraints pose a significant challenge to existing large language models (LLMs). An observation is that the LLMs exhibit dramatic performance fluctuation when disturbing the order of the incorporated constraints. Yet, none of the existing works has systematically investigated this position bias problem in the field of multi-constraint instruction following. To bridge this gap, we design a probing task where we quantitatively measure the difficulty distribution of the constraints by a novel Difficulty Distribution Index (CDDI). Through the experimental results, we find that LLMs are more performant when presented with the constraints in a ``hard-to-easy'' order. This preference can be generalized to LLMs with different architecture or different sizes of parameters. Additionally, we conduct an explanation study, providing an intuitive insight into the correlation between the LLM's attention and constraint orders. Our code and dataset are publicly available at https://github.com/meowpass/PBIF.

Order Matters: Investigate the Position Bias in Multi-constraint Instruction Following

TL;DR

This work identifies and quantifies position bias in multi-constraint instruction following by introducing the Constraint Difficulty Distribution Index (CDDI). It develops a probing task that synthesizes diverse multi-constraint instructions and evaluates LLMs under single-round and multi-round inference, revealing consistent improvements when constraints are ordered from hard to easy. An explanation study using gradient-based attributions links attention patterns to constraint-order performance, showing a strong correlation between constraint-specific attention and accuracy. The results highlight a systematic, architecture-agnostic phenomenon with practical implications for prompt design and evaluation of LLMs in real-world, constraint-rich tasks.

Abstract

Real-world instructions with multiple constraints pose a significant challenge to existing large language models (LLMs). An observation is that the LLMs exhibit dramatic performance fluctuation when disturbing the order of the incorporated constraints. Yet, none of the existing works has systematically investigated this position bias problem in the field of multi-constraint instruction following. To bridge this gap, we design a probing task where we quantitatively measure the difficulty distribution of the constraints by a novel Difficulty Distribution Index (CDDI). Through the experimental results, we find that LLMs are more performant when presented with the constraints in a ``hard-to-easy'' order. This preference can be generalized to LLMs with different architecture or different sizes of parameters. Additionally, we conduct an explanation study, providing an intuitive insight into the correlation between the LLM's attention and constraint orders. Our code and dataset are publicly available at https://github.com/meowpass/PBIF.

Paper Structure

This paper contains 24 sections, 8 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 2: The procedure of the probing task. First, we synthesize the initial instructions by sampling seed instructions and corresponding constraints. Then, we obtain instructions with different constraint orders by reordering the incorporated constraints. Finally, we conduct model inference on single and multi-round settings.
  • Figure 3: The statistic of different types of constraints in the probing data. The 7cons and 9cons stand for the setting when $n$=7 and $n$=9, respectively.
  • Figure 4: The performance of different LLMs in the single-round inference. The left and right figures show the results with the number of constraints $n$ set to 7 and 9, respectively. With the increase of the CDDI, the constraint order changes from "easy-to-hard" to "hard-to-easy".
  • Figure 5: The performance of different LLMs in the multi-round inference. The left and right figures show the results with the number of constraints $n$ set to 7 and 9, respectively. With the increase of the CDDI, the constraint order changes from "easy-to-hard" to "hard-to-easy".
  • Figure 6: (a) The importance weights assigned by the LLM when handling constraints in different positions. (b) The total importance weights which designated to the constraint part in the multi-constraint instructions among three different constraint distributions.
  • ...and 1 more figures