Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers

Tianhua Zhang; Kun Li; Hongyin Luo; Xixin Wu; James Glass; Helen Meng

Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers

Tianhua Zhang, Kun Li, Hongyin Luo, Xixin Wu, James Glass, Helen Meng

TL;DR

AdaQR presents a weakly supervised framework for adaptive query rewriting in open-domain conversational QA. It trains a rewriter with limited rewrite labels, generates rewrite candidates, and uses the marginal probability of answers over retrieved passages as a retriever-preference reward, optimizing via Direct Preference Optimization. The approach yields consistent in-domain improvements and strong cross-domain adaptation without passage labels, performing well with both sparse and dense retrievers. This yields a practical, data-efficient path to deploy robust QR systems across diverse CQA tasks.

Abstract

Query rewriting is a crucial technique for passage retrieval in open-domain conversational question answering (CQA). It decontexualizes conversational queries into self-contained questions suitable for off-the-shelf retrievers. Existing methods attempt to incorporate retriever's preference during the training of rewriting models. However, these approaches typically rely on extensive annotations such as in-domain rewrites and/or relevant passage labels, limiting the models' generalization and adaptation capabilities. In this paper, we introduce AdaQR ($\textbf{Ada}$ptive $\textbf{Q}$uery $\textbf{R}$ewriting), a framework for training query rewriting models with limited rewrite annotations from seed datasets and completely no passage label. Our approach begins by fine-tuning compact large language models using only ~$10\%$ of rewrite annotations from the seed dataset training split. The models are then utilized to generate rewrite candidates for each query instance. A novel approach is then proposed to assess retriever's preference for these candidates by the probability of answers conditioned on the conversational query by marginalizing the Top-$K$ passages. This serves as the reward for optimizing the rewriter further using Direct Preference Optimization (DPO), a process free of rewrite and retrieval annotations. Experimental results on four open-domain CQA datasets demonstrate that AdaQR not only enhances the in-domain capabilities of the rewriter with limited annotation requirement, but also adapts effectively to out-of-domain datasets.

Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers

TL;DR

Abstract

ptive

uery

ewriting), a framework for training query rewriting models with limited rewrite annotations from seed datasets and completely no passage label. Our approach begins by fine-tuning compact large language models using only ~

of rewrite annotations from the seed dataset training split. The models are then utilized to generate rewrite candidates for each query instance. A novel approach is then proposed to assess retriever's preference for these candidates by the probability of answers conditioned on the conversational query by marginalizing the Top-

passages. This serves as the reward for optimizing the rewriter further using Direct Preference Optimization (DPO), a process free of rewrite and retrieval annotations. Experimental results on four open-domain CQA datasets demonstrate that AdaQR not only enhances the in-domain capabilities of the rewriter with limited annotation requirement, but also adapts effectively to out-of-domain datasets.

Paper Structure (23 sections, 4 equations, 5 figures, 11 tables)

This paper contains 23 sections, 4 equations, 5 figures, 11 tables.

Introduction
Methodology
Task Formulation
Overview
Supervised Fine-Tuning
Reward Collection
Preference Optimization
Experiments
Main Results
Analysis
Comparison of Weakly Supervised Approaches
Effect of $K$ Values in Reward Calculation
Effect of Data Volume in Preference Optimization
Related Works
Conclusion
...and 8 more sections

Figures (5)

Figure 1: Illustration of AdaQR which applies preference optimization to the rewriter $\mathcal{M}_{\theta}$.
Figure 2: Average performance of Pseudo-Label and Ours over SFT as the F1-score (x-axis) between the answers and gold passages declines. Scores $>1$ denote improvement over SFT. The four vertical lines correspond to the F1-scores of QReCC ($0.704$), Doc2Dial ($0.525$), MultiDoc2Dial ($0.522$) and TopiOCQA ($0.392$).
Figure 3: Retrieval performance with varying top-$K$ values ($k=1,3,5,7,9$) in reward calculation using QReCC-SFT. See detailed results in Appendix Table \ref{['tab: analysis-topk']}. We further analyze two passage organization types (concatenation and marginalization) in Appendix \ref{['sec: appendix-concate']}.
Figure 4: Average retrieval performance with varying number of training data during preference optimization under QReCC-SFT setting.
Figure 5: Performance ratio of turns with topic-shift and all instances on the test set of TopiOCQA with dense and sparse retrievers.

Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers

TL;DR

Abstract

Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers

Authors

TL;DR

Abstract

Table of Contents

Figures (5)