Table of Contents
Fetching ...

History-Aware Conversational Dense Retrieval

Fengran Mo, Chen Qu, Kelong Mao, Tianyu Zhu, Zhan Su, Kaiyu Huang, Jian-Yun Nie

TL;DR

HAConvDR tackles noise and shortcut history in conversational dense retrieval by combining context-denoised query reformulation with history-derived supervision signals in a three-stage pipeline. It first generates pseudo relevance judgments for historical turns, then reformulates the current query using relevant history, and finally trains a dense retriever with a history-aware contrastive loss that integrates pseudo positives and historical hard negatives. Across TopiOCQA and QReCC, HAConvDR yields significant gains over strong baselines, particularly in long, topic-switching sessions, demonstrating the value of explicit history filtering and supervision. The approach highlights practical benefits of leveraging historical passages as signals to improve retrieval effectiveness in real-world conversational search settings.

Abstract

Conversational search facilitates complex information retrieval by enabling multi-turn interactions between users and the system. Supporting such interactions requires a comprehensive understanding of the conversational inputs to formulate a good search query based on historical information. In particular, the search query should include the relevant information from the previous conversation turns. However, current approaches for conversational dense retrieval primarily rely on fine-tuning a pre-trained ad-hoc retriever using the whole conversational search session, which can be lengthy and noisy. Moreover, existing approaches are limited by the amount of manual supervision signals in the existing datasets. To address the aforementioned issues, we propose a History-Aware Conversational Dense Retrieval (HAConvDR) system, which incorporates two ideas: context-denoised query reformulation and automatic mining of supervision signals based on the actual impact of historical turns. Experiments on two public conversational search datasets demonstrate the improved history modeling capability of HAConvDR, in particular for long conversations with topic shifts.

History-Aware Conversational Dense Retrieval

TL;DR

HAConvDR tackles noise and shortcut history in conversational dense retrieval by combining context-denoised query reformulation with history-derived supervision signals in a three-stage pipeline. It first generates pseudo relevance judgments for historical turns, then reformulates the current query using relevant history, and finally trains a dense retriever with a history-aware contrastive loss that integrates pseudo positives and historical hard negatives. Across TopiOCQA and QReCC, HAConvDR yields significant gains over strong baselines, particularly in long, topic-switching sessions, demonstrating the value of explicit history filtering and supervision. The approach highlights practical benefits of leveraging historical passages as signals to improve retrieval effectiveness in real-world conversational search settings.

Abstract

Conversational search facilitates complex information retrieval by enabling multi-turn interactions between users and the system. Supporting such interactions requires a comprehensive understanding of the conversational inputs to formulate a good search query based on historical information. In particular, the search query should include the relevant information from the previous conversation turns. However, current approaches for conversational dense retrieval primarily rely on fine-tuning a pre-trained ad-hoc retriever using the whole conversational search session, which can be lengthy and noisy. Moreover, existing approaches are limited by the amount of manual supervision signals in the existing datasets. To address the aforementioned issues, we propose a History-Aware Conversational Dense Retrieval (HAConvDR) system, which incorporates two ideas: context-denoised query reformulation and automatic mining of supervision signals based on the actual impact of historical turns. Experiments on two public conversational search datasets demonstrate the improved history modeling capability of HAConvDR, in particular for long conversations with topic shifts.
Paper Structure (19 sections, 4 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 19 sections, 4 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Illustration of shortcut history dependency -- passages addressing historical information needs $p_3^*$ can be ranked higher than those addressing the current information need $p_4^*$, due to the noise in the reformulated query. The highly relevant passage $p_1^*$ could be served as PRF. The red text denotes the gold answer $a_i$ in $p_i^*$.
  • Figure 2: Overview of HAConvDR. The first stage (left) is to conduct pseudo relevance judgment (PRJ) between the current query and each historical turn. Based on the PRJ results, the second stage (middle) is to perform context-denoised query reformulation and positive and negative supervision signals mining. The third stage (right) is to conduct conversational dense retrieval training with history-aware contrastive learning.
  • Figure 3: Portion of relevant historical turns over all historical turns, as conversations evolve.
  • Figure 4: The percentage of the queries whose retrieved list has the ground-truth passage of the historical turns ranked higher than its own.
  • Figure 5: T-SNE visualization of query, ground-truth passage, and pseudo positives and history hard negatives embeddings via two ANCE models with and without HAConvDR training.