DREQ: Document Re-Ranking Using Entity-based Query Understanding
Shubham Chatterjee, Iain Mackie, Jeff Dalton
TL;DR
DREQ addresses the gap that not all entities in a document are equally relevant to a given query by learning a query-specific, entity-centric representation and combining it with a text-centric document representation into a hybrid embedding for re-ranking. It introduces a dedicated entity-ranking component that scores entities via query-conditioned embeddings and uses these scores to weight entity contributions within documents, then fuses this with a text-based representation to compute a joint score through a learned interaction model. End-to-end training with binary cross-entropy tunes both entity and document components, and extensive experiments on CODEC, Robust04, News 2021, and Core 2018 demonstrate state-of-the-art performance and robust gains on difficult queries. The work also provides thorough ablations showing the critical role of entity weighting and the impact of different entity-ranking choices, underscoring the practical value of incorporating query-aware entities into dense document re-ranking systems.
Abstract
While entity-oriented neural IR models have advanced significantly, they often overlook a key nuance: the varying degrees of influence individual entities within a document have on its overall relevance. Addressing this gap, we present DREQ, an entity-oriented dense document re-ranking model. Uniquely, we emphasize the query-relevant entities within a document's representation while simultaneously attenuating the less relevant ones, thus obtaining a query-specific entity-centric document representation. We then combine this entity-centric document representation with the text-centric representation of the document to obtain a "hybrid" representation of the document. We learn a relevance score for the document using this hybrid representation. Using four large-scale benchmarks, we show that DREQ outperforms state-of-the-art neural and non-neural re-ranking methods, highlighting the effectiveness of our entity-oriented representation approach.
