Table of Contents
Fetching ...

ConvSDG: Session Data Generation for Conversational Search

Fengran Mo, Bole Yi, Kelong Mao, Chen Qu, Kaiyu Huang, Jian-Yun Nie

TL;DR

This work proposes ConvSDG, a simple yet effective framework to explore the feasibility of boosting conversational search by using LLM for session data generation with unsupervised and semi-supervised learning, according to the availability of relevance judgments.

Abstract

Conversational search provides a more convenient interface for users to search by allowing multi-turn interaction with the search engine. However, the effectiveness of the conversational dense retrieval methods is limited by the scarcity of training data required for their fine-tuning. Thus, generating more training conversational sessions with relevant labels could potentially improve search performance. Based on the promising capabilities of large language models (LLMs) on text generation, we propose ConvSDG, a simple yet effective framework to explore the feasibility of boosting conversational search by using LLM for session data generation. Within this framework, we design dialogue/session-level and query-level data generation with unsupervised and semi-supervised learning, according to the availability of relevance judgments. The generated data are used to fine-tune the conversational dense retriever. Extensive experiments on four widely used datasets demonstrate the effectiveness and broad applicability of our ConvSDG framework compared with several strong baselines.

ConvSDG: Session Data Generation for Conversational Search

TL;DR

This work proposes ConvSDG, a simple yet effective framework to explore the feasibility of boosting conversational search by using LLM for session data generation with unsupervised and semi-supervised learning, according to the availability of relevance judgments.

Abstract

Conversational search provides a more convenient interface for users to search by allowing multi-turn interaction with the search engine. However, the effectiveness of the conversational dense retrieval methods is limited by the scarcity of training data required for their fine-tuning. Thus, generating more training conversational sessions with relevant labels could potentially improve search performance. Based on the promising capabilities of large language models (LLMs) on text generation, we propose ConvSDG, a simple yet effective framework to explore the feasibility of boosting conversational search by using LLM for session data generation. Within this framework, we design dialogue/session-level and query-level data generation with unsupervised and semi-supervised learning, according to the availability of relevance judgments. The generated data are used to fine-tune the conversational dense retriever. Extensive experiments on four widely used datasets demonstrate the effectiveness and broad applicability of our ConvSDG framework compared with several strong baselines.
Paper Structure (21 sections, 3 equations, 4 figures, 3 tables)

This paper contains 21 sections, 3 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Overview of ConvSDG. Three parts are included: (1) Two prompts for session data generation at different levels, (2) Produce supervision signals for generated data, PRF for session generation and existing annotations for query augmented, (3) Conduct conversational dense retrieval fine-tuning with the generated data.
  • Figure 2: An example to illustrate the conversational session data generation for both dialogue-level (left) and query-level (right).
  • Figure 3: Effectiveness of using generated supervision signals by different query forms based on ANCE dense retriever.
  • Figure 4: Effectiveness of different sizes of generated data used for conversational fine-tuning with unsupervised (left) and semi-supervised (right) learning manner.