Table of Contents
Fetching ...

Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers

Tianhua Zhang, Kun Li, Hongyin Luo, Xixin Wu, James Glass, Helen Meng

TL;DR

AdaQR presents a weakly supervised framework for adaptive query rewriting in open-domain conversational QA. It trains a rewriter with limited rewrite labels, generates rewrite candidates, and uses the marginal probability of answers over retrieved passages as a retriever-preference reward, optimizing via Direct Preference Optimization. The approach yields consistent in-domain improvements and strong cross-domain adaptation without passage labels, performing well with both sparse and dense retrievers. This yields a practical, data-efficient path to deploy robust QR systems across diverse CQA tasks.

Abstract

Query rewriting is a crucial technique for passage retrieval in open-domain conversational question answering (CQA). It decontexualizes conversational queries into self-contained questions suitable for off-the-shelf retrievers. Existing methods attempt to incorporate retriever's preference during the training of rewriting models. However, these approaches typically rely on extensive annotations such as in-domain rewrites and/or relevant passage labels, limiting the models' generalization and adaptation capabilities. In this paper, we introduce AdaQR ($\textbf{Ada}$ptive $\textbf{Q}$uery $\textbf{R}$ewriting), a framework for training query rewriting models with limited rewrite annotations from seed datasets and completely no passage label. Our approach begins by fine-tuning compact large language models using only ~$10\%$ of rewrite annotations from the seed dataset training split. The models are then utilized to generate rewrite candidates for each query instance. A novel approach is then proposed to assess retriever's preference for these candidates by the probability of answers conditioned on the conversational query by marginalizing the Top-$K$ passages. This serves as the reward for optimizing the rewriter further using Direct Preference Optimization (DPO), a process free of rewrite and retrieval annotations. Experimental results on four open-domain CQA datasets demonstrate that AdaQR not only enhances the in-domain capabilities of the rewriter with limited annotation requirement, but also adapts effectively to out-of-domain datasets.

Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers

TL;DR

AdaQR presents a weakly supervised framework for adaptive query rewriting in open-domain conversational QA. It trains a rewriter with limited rewrite labels, generates rewrite candidates, and uses the marginal probability of answers over retrieved passages as a retriever-preference reward, optimizing via Direct Preference Optimization. The approach yields consistent in-domain improvements and strong cross-domain adaptation without passage labels, performing well with both sparse and dense retrievers. This yields a practical, data-efficient path to deploy robust QR systems across diverse CQA tasks.

Abstract

Query rewriting is a crucial technique for passage retrieval in open-domain conversational question answering (CQA). It decontexualizes conversational queries into self-contained questions suitable for off-the-shelf retrievers. Existing methods attempt to incorporate retriever's preference during the training of rewriting models. However, these approaches typically rely on extensive annotations such as in-domain rewrites and/or relevant passage labels, limiting the models' generalization and adaptation capabilities. In this paper, we introduce AdaQR (ptive uery ewriting), a framework for training query rewriting models with limited rewrite annotations from seed datasets and completely no passage label. Our approach begins by fine-tuning compact large language models using only ~ of rewrite annotations from the seed dataset training split. The models are then utilized to generate rewrite candidates for each query instance. A novel approach is then proposed to assess retriever's preference for these candidates by the probability of answers conditioned on the conversational query by marginalizing the Top- passages. This serves as the reward for optimizing the rewriter further using Direct Preference Optimization (DPO), a process free of rewrite and retrieval annotations. Experimental results on four open-domain CQA datasets demonstrate that AdaQR not only enhances the in-domain capabilities of the rewriter with limited annotation requirement, but also adapts effectively to out-of-domain datasets.
Paper Structure (23 sections, 4 equations, 5 figures, 11 tables)

This paper contains 23 sections, 4 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: Illustration of AdaQR which applies preference optimization to the rewriter $\mathcal{M}_{\theta}$.
  • Figure 2: Average performance of Pseudo-Label and Ours over SFT as the F1-score (x-axis) between the answers and gold passages declines. Scores $>1$ denote improvement over SFT. The four vertical lines correspond to the F1-scores of QReCC ($0.704$), Doc2Dial ($0.525$), MultiDoc2Dial ($0.522$) and TopiOCQA ($0.392$).
  • Figure 3: Retrieval performance with varying top-$K$ values ($k=1,3,5,7,9$) in reward calculation using QReCC-SFT. See detailed results in Appendix Table \ref{['tab: analysis-topk']}. We further analyze two passage organization types (concatenation and marginalization) in Appendix \ref{['sec: appendix-concate']}.
  • Figure 4: Average retrieval performance with varying number of training data during preference optimization under QReCC-SFT setting.
  • Figure 5: Performance ratio of turns with topic-shift and all instances on the test set of TopiOCQA with dense and sparse retrievers.