Table of Contents
Fetching ...

Entity Retrieval for Answering Entity-Centric Questions

Hassan S. Shavarani, Anoop Sarkar

TL;DR

Entity-centric questions challenge standard retrieval by relying on question-document similarity. The authors propose Entity Retrieval, which uses salient question entities to fetch knowledge-base articles, truncates them to a fixed window, and uses them to augment LLM prompts. Through evaluations on FactoidQA and EntityQuestions, Entity Retrieval achieves higher retrieval quality and QA accuracy with fewer documents and better efficiency than BM25, DPR, and ANCE, with SpEL-based linking enabling operation without manual annotations. The work demonstrates practical gains for offline/embedded deployment and highlights the importance of robust entity linking and knowledge-base selection for real-world entity-centric QA.

Abstract

The similarity between the question and indexed documents is a crucial factor in document retrieval for retrieval-augmented question answering. Although this is typically the only method for obtaining the relevant documents, it is not the sole approach when dealing with entity-centric questions. In this study, we propose Entity Retrieval, a novel retrieval method which rather than relying on question-document similarity, depends on the salient entities within the question to identify the retrieval documents. We conduct an in-depth analysis of the performance of both dense and sparse retrieval methods in comparison to Entity Retrieval. Our findings reveal that our method not only leads to more accurate answers to entity-centric questions but also operates more efficiently.

Entity Retrieval for Answering Entity-Centric Questions

TL;DR

Entity-centric questions challenge standard retrieval by relying on question-document similarity. The authors propose Entity Retrieval, which uses salient question entities to fetch knowledge-base articles, truncates them to a fixed window, and uses them to augment LLM prompts. Through evaluations on FactoidQA and EntityQuestions, Entity Retrieval achieves higher retrieval quality and QA accuracy with fewer documents and better efficiency than BM25, DPR, and ANCE, with SpEL-based linking enabling operation without manual annotations. The work demonstrates practical gains for offline/embedded deployment and highlights the importance of robust entity linking and knowledge-base selection for real-world entity-centric QA.

Abstract

The similarity between the question and indexed documents is a crucial factor in document retrieval for retrieval-augmented question answering. Although this is typically the only method for obtaining the relevant documents, it is not the sole approach when dealing with entity-centric questions. In this study, we propose Entity Retrieval, a novel retrieval method which rather than relying on question-document similarity, depends on the salient entities within the question to identify the retrieval documents. We conduct an in-depth analysis of the performance of both dense and sparse retrieval methods in comparison to Entity Retrieval. Our findings reveal that our method not only leads to more accurate answers to entity-centric questions but also operates more efficiently.
Paper Structure (17 sections, 4 figures, 10 tables)

This paper contains 17 sections, 4 figures, 10 tables.

Figures (4)

  • Figure 1: Entity Retrieval simplifies the process of obtaining augmentation documents by replacing the need to search through large indexed passages with a straightforward lookup. For Q: What is the capital of Seine-Saint-Denis?Entity Retrieval considers the first few sentences of Seine-Saint-Denis Wikipedia article which states "Its prefecture is Bobigny." and returns A = Bobigny where the other retrieval methods return A = Saint-Denis or A = Paris.
  • Figure 2: The first paragraph of the Wikipedia article typically provides an informative summary for the entity. For example, the first paragraph of Swan Lake Wikipedia article contains the answer to "Who is the composer of The Swan Lake ballet?"
  • Figure 3: nDCG@$k$ scores evaluate the quality of BM25, DPR, ANCE, and Entity Retrieval by considering both the relevance and the position of documents in the top $k$ retrieved passages for each question. Note that Entity Retrieval typically results in $k$=1 document since the datasets under study often have one salient entity. The horizontal lines aid in visually comparing the performance of Entity Retrieval, which averages one document, to other methods retrieving $k$>1 documents.
  • Figure 4: Retrieval Accuracy scores showcasing the correlation between the number of retrieved documents and the expected answers' coverage in EntityQuestions (dev) subset.