History-Aware Conversational Dense Retrieval
Fengran Mo, Chen Qu, Kelong Mao, Tianyu Zhu, Zhan Su, Kaiyu Huang, Jian-Yun Nie
TL;DR
HAConvDR tackles noise and shortcut history in conversational dense retrieval by combining context-denoised query reformulation with history-derived supervision signals in a three-stage pipeline. It first generates pseudo relevance judgments for historical turns, then reformulates the current query using relevant history, and finally trains a dense retriever with a history-aware contrastive loss that integrates pseudo positives and historical hard negatives. Across TopiOCQA and QReCC, HAConvDR yields significant gains over strong baselines, particularly in long, topic-switching sessions, demonstrating the value of explicit history filtering and supervision. The approach highlights practical benefits of leveraging historical passages as signals to improve retrieval effectiveness in real-world conversational search settings.
Abstract
Conversational search facilitates complex information retrieval by enabling multi-turn interactions between users and the system. Supporting such interactions requires a comprehensive understanding of the conversational inputs to formulate a good search query based on historical information. In particular, the search query should include the relevant information from the previous conversation turns. However, current approaches for conversational dense retrieval primarily rely on fine-tuning a pre-trained ad-hoc retriever using the whole conversational search session, which can be lengthy and noisy. Moreover, existing approaches are limited by the amount of manual supervision signals in the existing datasets. To address the aforementioned issues, we propose a History-Aware Conversational Dense Retrieval (HAConvDR) system, which incorporates two ideas: context-denoised query reformulation and automatic mining of supervision signals based on the actual impact of historical turns. Experiments on two public conversational search datasets demonstrate the improved history modeling capability of HAConvDR, in particular for long conversations with topic shifts.
