Table of Contents
Fetching ...

DREQ: Document Re-Ranking Using Entity-based Query Understanding

Shubham Chatterjee, Iain Mackie, Jeff Dalton

TL;DR

DREQ addresses the gap that not all entities in a document are equally relevant to a given query by learning a query-specific, entity-centric representation and combining it with a text-centric document representation into a hybrid embedding for re-ranking. It introduces a dedicated entity-ranking component that scores entities via query-conditioned embeddings and uses these scores to weight entity contributions within documents, then fuses this with a text-based representation to compute a joint score through a learned interaction model. End-to-end training with binary cross-entropy tunes both entity and document components, and extensive experiments on CODEC, Robust04, News 2021, and Core 2018 demonstrate state-of-the-art performance and robust gains on difficult queries. The work also provides thorough ablations showing the critical role of entity weighting and the impact of different entity-ranking choices, underscoring the practical value of incorporating query-aware entities into dense document re-ranking systems.

Abstract

While entity-oriented neural IR models have advanced significantly, they often overlook a key nuance: the varying degrees of influence individual entities within a document have on its overall relevance. Addressing this gap, we present DREQ, an entity-oriented dense document re-ranking model. Uniquely, we emphasize the query-relevant entities within a document's representation while simultaneously attenuating the less relevant ones, thus obtaining a query-specific entity-centric document representation. We then combine this entity-centric document representation with the text-centric representation of the document to obtain a "hybrid" representation of the document. We learn a relevance score for the document using this hybrid representation. Using four large-scale benchmarks, we show that DREQ outperforms state-of-the-art neural and non-neural re-ranking methods, highlighting the effectiveness of our entity-oriented representation approach.

DREQ: Document Re-Ranking Using Entity-based Query Understanding

TL;DR

DREQ addresses the gap that not all entities in a document are equally relevant to a given query by learning a query-specific, entity-centric representation and combining it with a text-centric document representation into a hybrid embedding for re-ranking. It introduces a dedicated entity-ranking component that scores entities via query-conditioned embeddings and uses these scores to weight entity contributions within documents, then fuses this with a text-based representation to compute a joint score through a learned interaction model. End-to-end training with binary cross-entropy tunes both entity and document components, and extensive experiments on CODEC, Robust04, News 2021, and Core 2018 demonstrate state-of-the-art performance and robust gains on difficult queries. The work also provides thorough ablations showing the critical role of entity weighting and the impact of different entity-ranking choices, underscoring the practical value of incorporating query-aware entities into dense document re-ranking systems.

Abstract

While entity-oriented neural IR models have advanced significantly, they often overlook a key nuance: the varying degrees of influence individual entities within a document have on its overall relevance. Addressing this gap, we present DREQ, an entity-oriented dense document re-ranking model. Uniquely, we emphasize the query-relevant entities within a document's representation while simultaneously attenuating the less relevant ones, thus obtaining a query-specific entity-centric document representation. We then combine this entity-centric document representation with the text-centric representation of the document to obtain a "hybrid" representation of the document. We learn a relevance score for the document using this hybrid representation. Using four large-scale benchmarks, we show that DREQ outperforms state-of-the-art neural and non-neural re-ranking methods, highlighting the effectiveness of our entity-oriented representation approach.
Paper Structure (26 sections, 5 equations, 2 figures, 2 tables)

This paper contains 26 sections, 5 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Our proposed system DREQ uses a hybrid document embedding learnt using (1) the query-specific entity-centric embedding, and (2) text embedding of the document to learn the document score.
  • Figure 2: Difficulty test for nDCG@20 on Robust04 (title). DREQ improves performance for the most difficult queries.