Table of Contents
Fetching ...

UniRetriever: Multi-task Candidates Selection for Various Context-Adaptive Conversational Retrieval

Hongru Wang, Boyang Xue, Baohang Zhou, Rui Wang, Fei Mi, Weichao Wang, Yasheng Wang, Kam-Fai Wong

TL;DR

This work tackles conversational retrieval by unifying three core candidate-selection tasks—persona, knowledge, and response—into a single multi-task framework. It introduces UniversalCR, a dual-encoder architecture with a context-adaptive dialogue encoder and a unified candidate encoder, trained with historical contrastive learning and pairwise similarity losses to leverage semi-hard negatives and model ranking across dialogue contexts. The approach demonstrates state-of-the-art retrieval quality within its training domain and strong zero-shot generalization across diverse datasets, while maintaining a simple dot-product bottleneck for efficiency. The results suggest that a universal retriever can effectively support multiple facets of conversational AI, offering both high performance and practical applicability for integrated dialogue systems.

Abstract

Conversational retrieval refers to an information retrieval system that operates in an iterative and interactive manner, requiring the retrieval of various external resources, such as persona, knowledge, and even response, to effectively engage with the user and successfully complete the dialogue. However, most previous work trained independent retrievers for each specific resource, resulting in sub-optimal performance and low efficiency. Thus, we propose a multi-task framework function as a universal retriever for three dominant retrieval tasks during the conversation: persona selection, knowledge selection, and response selection. To this end, we design a dual-encoder architecture consisting of a context-adaptive dialogue encoder and a candidate encoder, aiming to attention to the relevant context from the long dialogue and retrieve suitable candidates by simply a dot product. Furthermore, we introduce two loss constraints to capture the subtle relationship between dialogue context and different candidates by regarding historically selected candidates as hard negatives. Extensive experiments and analysis establish state-of-the-art retrieval quality both within and outside its training domain, revealing the promising potential and generalization capability of our model to serve as a universal retriever for different candidate selection tasks simultaneously.

UniRetriever: Multi-task Candidates Selection for Various Context-Adaptive Conversational Retrieval

TL;DR

This work tackles conversational retrieval by unifying three core candidate-selection tasks—persona, knowledge, and response—into a single multi-task framework. It introduces UniversalCR, a dual-encoder architecture with a context-adaptive dialogue encoder and a unified candidate encoder, trained with historical contrastive learning and pairwise similarity losses to leverage semi-hard negatives and model ranking across dialogue contexts. The approach demonstrates state-of-the-art retrieval quality within its training domain and strong zero-shot generalization across diverse datasets, while maintaining a simple dot-product bottleneck for efficiency. The results suggest that a universal retriever can effectively support multiple facets of conversational AI, offering both high performance and practical applicability for integrated dialogue systems.

Abstract

Conversational retrieval refers to an information retrieval system that operates in an iterative and interactive manner, requiring the retrieval of various external resources, such as persona, knowledge, and even response, to effectively engage with the user and successfully complete the dialogue. However, most previous work trained independent retrievers for each specific resource, resulting in sub-optimal performance and low efficiency. Thus, we propose a multi-task framework function as a universal retriever for three dominant retrieval tasks during the conversation: persona selection, knowledge selection, and response selection. To this end, we design a dual-encoder architecture consisting of a context-adaptive dialogue encoder and a candidate encoder, aiming to attention to the relevant context from the long dialogue and retrieve suitable candidates by simply a dot product. Furthermore, we introduce two loss constraints to capture the subtle relationship between dialogue context and different candidates by regarding historically selected candidates as hard negatives. Extensive experiments and analysis establish state-of-the-art retrieval quality both within and outside its training domain, revealing the promising potential and generalization capability of our model to serve as a universal retriever for different candidate selection tasks simultaneously.
Paper Structure (20 sections, 10 equations, 5 figures, 4 tables)

This paper contains 20 sections, 10 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Different candidates selection tasks in a dialogue system: persona selection, knowledge selection, and response selection task. According to $u_3$ in the dialogue context, it is obvious to select $p_n$, $k_n$, and $r_2$ as target persona, knowledge, and response for the next turn respectively, while the $p_2$, $k_1$, and $r_1$ are historical selected persona, knowledge and response for historical turn $u_1$.
  • Figure 2: The proposed Universal Conversational Retrieval based on Dual-encoder architecture, with the goal of optimizing historical contrastive loss and pairwise similarity loss thanks to the introduction of historical candidates.
  • Figure 3: The zero-shot performance of UnifiedD$_{single}$ and Unified$_{full}$, and the supervised fine-tuning result of UnifiedD$_{full}$ on three New different datasets: Knowledge Behined Persona (persona selection), DuSinc (knowledge selection), and KdConv (response selection).
  • Figure 4: The Performance of UnifiedD with different k or without any utterance from the previous session. Blue line denotes the performance of Unified$_{full}$ without using any information from previous session. Here we report the R@1 metric.
  • Figure 5: The Performance of UniversalCR$_{full}$ on different selection tasks: persona selection, knowledge selection, and response selection, with the number of candidates ranging from 256 to 2.