RoParQ: Paraphrase-Aware Alignment of Large Language Models Towards Robustness to Paraphrased Questions
Minjoon Choi
TL;DR
RoParQ introduces a targeted benchmark and metric to quantify cross-paraphrase robustness in closed-book MCQA. It combines paraphrase generation via proprietary models with judge-based filtering to isolate inconsistent confidence and defines XParaCon to measure semantic invariance across paraphrase variants. The authors propose a reasoning-based paraphrase-aware SFT (via LoRA) to align models toward invariant answers, demonstrating that lightweight models can reach robustness levels of larger models. This work addresses superficial memorization in LLMs and offers a practical path toward more reliable, semantically grounded QA systems.
Abstract
Large Language Models (LLMs) often exhibit inconsistent behavior when answering paraphrased questions, suggesting a reliance on surface-level patterns rather than true semantic understanding. To address this limitation, we introduce RoParQ, a benchmark specifically constructed to evaluate cross-paraphrase consistency in closed-book multiple-choice QA. This benchmark is derived from standard datasets by generating paraphrases via proprietary models and selectively retaining examples that elicit inconsistent confidence from a judge model. We further propose XParaCon, a novel evaluation metric that quantifies a model's robustness by measuring the standard deviation of accuracies across question variants. Additionally, we implement a reasoning-based, paraphrase-aware Supervised Fine-Tuning (SFT) strategy designed to align models toward semantic invariance. Our experiments demonstrate that this targeted alignment significantly enhances robustness. Notably, fine-tuned lightweight models achieved consistency levels comparable to much larger pre-trained models. These results highlight the efficacy of our approach in mitigating superficial memorization and fostering more robust, reliable LLMs.
