Order Independence With Finetuning
Katrina Brown, Reid McIlroy
TL;DR
The paper tackles the problem of order dependence in large language models during multiple-choice QA by integrating Set-Based Prompting (SBP) into finetuning. It introduces a margin-based contrastive objective to align SBP-formatted prompts with the model’s training distribution, mitigating performance drops seen when SBP is applied at inference time alone. Across in-distribution MMLU and out-of-distribution CSQA and ARC Challenge, SBP finetuning significantly boosts order-invariant accuracy while preserving overall language modeling capabilities, with margin-based training outperforming standard cross-entropy. The work demonstrates the promise of order-invariant modeling for fairer, more reliable LLMs and suggests directions for extending SBP to other tasks and configurations.
Abstract
Large language models (LLMs) demonstrate remarkable performance on many NLP tasks, yet often exhibit order dependence: simply reordering semantically identical tokens (e.g., answer choices in multiple-choice questions) can lead to inconsistent predictions. Recent work proposes Set-Based Prompting (SBP) as a way to remove order information from designated token subsets, thereby mitigating positional biases. However, applying SBP on base models induces an out-of-distribution input format, which can degrade in-distribution performance. We introduce a fine-tuning strategy that integrates SBP into the training process, "pulling" these set-formatted prompts closer to the model's training manifold. We show that SBP can be incorporated into a model via fine-tuning. Our experiments on in-distribution (MMLU) and out-of-distribution (CSQA, ARC Challenge) multiple-choice tasks show that SBP fine-tuning significantly improves accuracy and robustness to answer-order permutations, all while preserving broader language modeling capabilities. We discuss the broader implications of order-invariant modeling and outline future directions for building fairer, more consistent LLMs.
