Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index

Minjoon Seo; Jinhyuk Lee; Tom Kwiatkowski; Ankur P. Parikh; Ali Farhadi; Hannaneh Hajishirzi

Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index

Minjoon Seo, Jinhyuk Lee, Tom Kwiatkowski, Ankur P. Parikh, Ali Farhadi, Hannaneh Hajishirzi

TL;DR

This paper introduces Dense-Sparse Phrase Index (DenSPI), a real-time open-domain QA approach that decouples question processing from document encoding by indexing offline, query-agnostic phrase representations. By combining dense contextual embeddings with sparse tf-idf-based lexical features, DenSPI enables fast inner-product search over hundreds of billions of phrases, using memory- and compute-efficient strategies (pointers, filtering, and quantization) to fit into ~1.2 TB. The authors demonstrate substantial speedups on CPU-only setups and competitive accuracy on SQuAD v1.1 and open-domain SQuAD-Open, with ablations validating the importance of the coherency component and the hybrid search strategy. The work shows that dense-sparse phrase indexing can deliver near real-time QA at web scale, while highlighting a decomposability gap that motivates further refinements in phrase representation and search techniques.

Abstract

Existing open-domain question answering (QA) models are not suitable for real-time usage because they need to process several long documents on-demand for every input query. In this paper, we introduce the query-agnostic indexable representation of document phrases that can drastically speed up open-domain QA and also allows us to reach long-tail targets. In particular, our dense-sparse phrase encoding effectively captures syntactic, semantic, and lexical information of the phrases and eliminates the pipeline filtering of context documents. Leveraging optimization strategies, our model can be trained in a single 4-GPU server and serve entire Wikipedia (up to 60 billion phrases) under 2TB with CPUs only. Our experiments on SQuAD-Open show that our model is more accurate than DrQA (Chen et al., 2017) with 6000x reduced computational cost, which translates into at least 58x faster end-to-end inference benchmark on CPUs.

Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index

TL;DR

Abstract

Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)