Table of Contents
Fetching ...

REGENT: Relevance-Guided Attention for Entity-Aware Multi-Vector Neural Re-Ranking

Shubham Chatterjee

TL;DR

REGENT tackles long-form information needs by embedding relevance signals directly into neural attention through a dual-pathway architecture that blends token-level BM25 lexical cues with query-specific entity semantics. The approach introduces relevance-guided attention, token-level BM25 integration, and dynamic entity-aware processing, achieving state-of-the-art re-ranking on three long-document benchmarks with up to 108% improvement over BM25 and notable gains over ColBERT and RankVicuna. Ablation studies show the entity semantic skeleton as the core driver of performance, with lexical signals providing important fine-grained grounding. This work establishes a new paradigm for entity-aware neural IR by tightly weaving lexical and semantic signals into neural attention, enabling robust handling of complex queries in long documents.

Abstract

Current neural re-rankers often struggle with complex information needs and long, content-rich documents. The fundamental issue is not computational--it is intelligent content selection: identifying what matters in lengthy, multi-faceted texts. While humans naturally anchor their understanding around key entities and concepts, neural models process text within rigid token windows, treating all interactions as equally important and missing critical semantic signals. We introduce REGENT, a neural re-ranking model that mimics human-like understanding by using entities as a "semantic skeleton" to guide attention. REGENT integrates relevance guidance directly into the attention mechanism, combining fine-grained lexical matching with high-level semantic reasoning. This relevance-guided attention enables the model to focus on conceptually important content while maintaining sensitivity to precise term matches. REGENT achieves new state-of-the-art performance in three challenging datasets, providing up to 108% improvement over BM25 and consistently outperforming strong baselines including ColBERT and RankVicuna. To our knowledge, this is the first work to successfully integrate entity semantics directly into neural attention, establishing a new paradigm for entity-aware information retrieval.

REGENT: Relevance-Guided Attention for Entity-Aware Multi-Vector Neural Re-Ranking

TL;DR

REGENT tackles long-form information needs by embedding relevance signals directly into neural attention through a dual-pathway architecture that blends token-level BM25 lexical cues with query-specific entity semantics. The approach introduces relevance-guided attention, token-level BM25 integration, and dynamic entity-aware processing, achieving state-of-the-art re-ranking on three long-document benchmarks with up to 108% improvement over BM25 and notable gains over ColBERT and RankVicuna. Ablation studies show the entity semantic skeleton as the core driver of performance, with lexical signals providing important fine-grained grounding. This work establishes a new paradigm for entity-aware neural IR by tightly weaving lexical and semantic signals into neural attention, enabling robust handling of complex queries in long documents.

Abstract

Current neural re-rankers often struggle with complex information needs and long, content-rich documents. The fundamental issue is not computational--it is intelligent content selection: identifying what matters in lengthy, multi-faceted texts. While humans naturally anchor their understanding around key entities and concepts, neural models process text within rigid token windows, treating all interactions as equally important and missing critical semantic signals. We introduce REGENT, a neural re-ranking model that mimics human-like understanding by using entities as a "semantic skeleton" to guide attention. REGENT integrates relevance guidance directly into the attention mechanism, combining fine-grained lexical matching with high-level semantic reasoning. This relevance-guided attention enables the model to focus on conceptually important content while maintaining sensitivity to precise term matches. REGENT achieves new state-of-the-art performance in three challenging datasets, providing up to 108% improvement over BM25 and consistently outperforming strong baselines including ColBERT and RankVicuna. To our knowledge, this is the first work to successfully integrate entity semantics directly into neural attention, establishing a new paradigm for entity-aware information retrieval.

Paper Structure

This paper contains 25 sections, 5 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: REGENT Architecture Overview. The model processes query and document inputs through separate BERT encoders to generate contextual embeddings. REGENT employs a dual-pathway attention mechanism: (1) The token pathway enhances document key and value representations with BM25 scores, then applies cross-attention to capture lexical matches between query and document tokens. (2) The entity pathway first projects pre-computed entity embeddings to the hidden dimension, computes entity-entity attention to identify semantically related concepts, then uses this entity context to guide token-level attention. An adaptive fusion mechanism learns to combine both pathways, balancing lexical matching with semantic understanding. The final output undergoes feed-forward processing with residual connections before mean pooling and scoring to produce a relevance score. This architecture enables fine-grained integration of traditional IR signals (BM25) with neural semantic reasoning through entities, moving beyond post-hoc score combination to embedded relevance guidance within the attention mechanism itself.
  • Figure 2: Query-level analysis on TREC Core 2018.
  • Figure 3: Difficulty test on Core18. 5% most difficult queries for BM25 to the left and the 5% easiest ones to the right. Performance reported as macro-averages across queries.
  • Figure 4: Visualization of entity attention patterns for query "Sony Cyberattack".