Table of Contents
Fetching ...

Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering

Hongda Sun, Yuxuan Liu, Chengwei Wu, Haiyu Yan, Cheng Tai, Xin Gao, Shuo Shang, Rui Yan

TL;DR

LLMQA is proposed, a generalized framework that formulates the ODQA process into three basic steps: query expansion, document selection, and answer generation, combining the superiority of both retrieval-based and generation-based evidence.

Abstract

Open-domain question answering (ODQA) has emerged as a pivotal research spotlight in information systems. Existing methods follow two main paradigms to collect evidence: (1) The \textit{retrieve-then-read} paradigm retrieves pertinent documents from an external corpus; and (2) the \textit{generate-then-read} paradigm employs large language models (LLMs) to generate relevant documents. However, neither can fully address multifaceted requirements for evidence. To this end, we propose LLMQA, a generalized framework that formulates the ODQA process into three basic steps: query expansion, document selection, and answer generation, combining the superiority of both retrieval-based and generation-based evidence. Since LLMs exhibit their excellent capabilities to accomplish various tasks, we instruct LLMs to play multiple roles as generators, rerankers, and evaluators within our framework, integrating them to collaborate in the ODQA process. Furthermore, we introduce a novel prompt optimization algorithm to refine role-playing prompts and steer LLMs to produce higher-quality evidence and answers. Extensive experimental results on widely used benchmarks (NQ, WebQ, and TriviaQA) demonstrate that LLMQA achieves the best performance in terms of both answer accuracy and evidence quality, showcasing its potential for advancing ODQA research and applications.

Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering

TL;DR

LLMQA is proposed, a generalized framework that formulates the ODQA process into three basic steps: query expansion, document selection, and answer generation, combining the superiority of both retrieval-based and generation-based evidence.

Abstract

Open-domain question answering (ODQA) has emerged as a pivotal research spotlight in information systems. Existing methods follow two main paradigms to collect evidence: (1) The \textit{retrieve-then-read} paradigm retrieves pertinent documents from an external corpus; and (2) the \textit{generate-then-read} paradigm employs large language models (LLMs) to generate relevant documents. However, neither can fully address multifaceted requirements for evidence. To this end, we propose LLMQA, a generalized framework that formulates the ODQA process into three basic steps: query expansion, document selection, and answer generation, combining the superiority of both retrieval-based and generation-based evidence. Since LLMs exhibit their excellent capabilities to accomplish various tasks, we instruct LLMs to play multiple roles as generators, rerankers, and evaluators within our framework, integrating them to collaborate in the ODQA process. Furthermore, we introduce a novel prompt optimization algorithm to refine role-playing prompts and steer LLMs to produce higher-quality evidence and answers. Extensive experimental results on widely used benchmarks (NQ, WebQ, and TriviaQA) demonstrate that LLMQA achieves the best performance in terms of both answer accuracy and evidence quality, showcasing its potential for advancing ODQA research and applications.
Paper Structure (23 sections, 7 equations, 5 figures, 9 tables, 1 algorithm)

This paper contains 23 sections, 7 equations, 5 figures, 9 tables, 1 algorithm.

Figures (5)

  • Figure 1: Collaborative interactions of multiple LLM roles.
  • Figure 2: The overview of our LLMQA. Three different role-play LLMs execute five main steps: (a) generate query expansion according to the question by generator; (b) select the best query expansion by evaluator; (c) rerank the top-$k$ documents according to the question and generated expansion by reranker; (d) select the best reranked documents by evaluator; (e) generate answer according to the question, generated expansion and reranked documents by generator. A more detailed insight into sliding window reranking: select top-2 documents from top-5 retrieved candidates with window size $w=3$, step $l=1$.
  • Figure 3: Analysis of query expansion.
  • Figure 4: Impact of document number.
  • Figure 5: Case study for prompt optimization. The EM score for the initial prompt is 54.82, and the EM score for the optimized prompt is 57.15. The results are reported on WebQ.