Table of Contents
Fetching ...

NORMY: Non-Uniform History Modeling for Open Retrieval Conversational Question Answering

Muhammad Shihab Rashid, Jannat Ara Meem, Vagelis Hristidis

TL;DR

The paper tackles the suboptimal practice of uniform history modeling across the Retriever, Reranker, and Reader in Open Retrieval Conversational QA. It introduces NORMY, an unsupervised pipeline that assigns broader history to retrieval, a narrower window for reranking, and a coreference-resolved, self-contained query for reading, achieving superior module-level and end-to-end performance. A novelRetriever with keyphrase-based history and past-passage reuse, along with an extended doc2dial-Or dataset and comprehensive experiments on three diverse datasets, demonstrates clear gains over state-of-the-art baselines. The work offers practical implications for robust OrConvQA systems and provides datasets and code to facilitate further research.

Abstract

Open Retrieval Conversational Question Answering (OrConvQA) answers a question given a conversation as context and a document collection. A typical OrConvQA pipeline consists of three modules: a Retriever to retrieve relevant documents from the collection, a Reranker to rerank them given the question and the context, and a Reader to extract an answer span. The conversational turns can provide valuable context to answer the final query. State-of-the-art OrConvQA systems use the same history modeling for all three modules of the pipeline. We hypothesize this as suboptimal. Specifically, we argue that a broader context is needed in the first modules of the pipeline to not miss relevant documents, while a narrower context is needed in the last modules to identify the exact answer span. We propose NORMY, the first unsupervised non-uniform history modeling pipeline which generates the best conversational history for each module. We further propose a novel Retriever for NORMY, which employs keyphrase extraction on the conversation history, and leverages passages retrieved in previous turns as additional context. We also created a new dataset for OrConvQA, by expanding the doc2dial dataset. We implemented various state-of-the-art history modeling techniques and comprehensively evaluated them separately for each module of the pipeline on three datasets: OR-QUAC, our doc2dial extension, and ConvMix. Our extensive experiments show that NORMY outperforms the state-of-the-art in the individual modules and in the end-to-end system.

NORMY: Non-Uniform History Modeling for Open Retrieval Conversational Question Answering

TL;DR

The paper tackles the suboptimal practice of uniform history modeling across the Retriever, Reranker, and Reader in Open Retrieval Conversational QA. It introduces NORMY, an unsupervised pipeline that assigns broader history to retrieval, a narrower window for reranking, and a coreference-resolved, self-contained query for reading, achieving superior module-level and end-to-end performance. A novelRetriever with keyphrase-based history and past-passage reuse, along with an extended doc2dial-Or dataset and comprehensive experiments on three diverse datasets, demonstrates clear gains over state-of-the-art baselines. The work offers practical implications for robust OrConvQA systems and provides datasets and code to facilitate further research.

Abstract

Open Retrieval Conversational Question Answering (OrConvQA) answers a question given a conversation as context and a document collection. A typical OrConvQA pipeline consists of three modules: a Retriever to retrieve relevant documents from the collection, a Reranker to rerank them given the question and the context, and a Reader to extract an answer span. The conversational turns can provide valuable context to answer the final query. State-of-the-art OrConvQA systems use the same history modeling for all three modules of the pipeline. We hypothesize this as suboptimal. Specifically, we argue that a broader context is needed in the first modules of the pipeline to not miss relevant documents, while a narrower context is needed in the last modules to identify the exact answer span. We propose NORMY, the first unsupervised non-uniform history modeling pipeline which generates the best conversational history for each module. We further propose a novel Retriever for NORMY, which employs keyphrase extraction on the conversation history, and leverages passages retrieved in previous turns as additional context. We also created a new dataset for OrConvQA, by expanding the doc2dial dataset. We implemented various state-of-the-art history modeling techniques and comprehensively evaluated them separately for each module of the pipeline on three datasets: OR-QUAC, our doc2dial extension, and ConvMix. Our extensive experiments show that NORMY outperforms the state-of-the-art in the individual modules and in the end-to-end system.
Paper Structure (16 sections, 8 equations, 3 figures, 6 tables, 1 algorithm)

This paper contains 16 sections, 8 equations, 3 figures, 6 tables, 1 algorithm.

Figures (3)

  • Figure 1: Example of the impact of non-uniform conversational history modeling. Full Conversational Context (FC) retrieves the most relevant passages in the Retriever module, while a narrower context, Last Question Rewrite (LQR) predicts the correct answer span in the Reader module.
  • Figure 2: The architecture of $\mathsf{NORMY}$. The input is the current question $q_n$, all history questions $q_i^{n-1}$, and the document collection $D$. The Retriever module models the history using keyphrase extraction per history turn and retrieves passages $P_0 \cdots P_k$ using BM25. Our novel History Aware Decay Scoring module refines all returned passages and outputs top-k. The Reranker reranks the passages using most recent $w$ turns and Reader uses coreference resolution to rewrite the last query $q_n$ and outputs the best answer span combining all three modules' scores.
  • Figure 3: (a) and (b) subgraphs show the impact of number of keywords $\textbf{y}$ and history window size $\textbf{w}$ for Retriever and Reranker.