Table of Contents
Fetching ...

Event GDR: Event-Centric Generative Document Retrieval

Yong Guan, Dingxiao Liu, Jinchen Ma, Hao Peng, Xiaozhi Wang, Lei Hou, Ru Li

TL;DR

Event GDR tackles two core issues in generative document retrieval by integrating event-centric knowledge: (1) modeling document inner-content through enriched event relations, and (2) constructing semantically structured identifiers via a hierarchical event taxonomy. It introduces an exchange-then-reflection (ExR) multi-agent knowledge extraction pipeline, uses Event Representations (EReps) and Event-Relation Representations (ERReps) to capture content and coherence, and builds identifiers with Event Identifiers (EIds) or Taxonomy-based Identifiers (ETIds). The training combines multi-task objectives over events, relations, and queries to enable joint indexing and retrieval. Empirical results on English NQ and Chinese DuReader show significant gains over baselines, with strong generalization across languages and taxonomies, suggesting practical impact for scalable, structured GDR systems.

Abstract

Generative document retrieval, an emerging paradigm in information retrieval, learns to build connections between documents and identifiers within a single model, garnering significant attention. However, there are still two challenges: (1) neglecting inner-content correlation during document representation; (2) lacking explicit semantic structure during identifier construction. Nonetheless, events have enriched relations and well-defined taxonomy, which could facilitate addressing the above two challenges. Inspired by this, we propose Event GDR, an event-centric generative document retrieval model, integrating event knowledge into this task. Specifically, we utilize an exchange-then-reflection method based on multi-agents for event knowledge extraction. For document representation, we employ events and relations to model the document to guarantee the comprehensiveness and inner-content correlation. For identifier construction, we map the events to well-defined event taxonomy to construct the identifiers with explicit semantic structure. Our method achieves significant improvement over the baselines on two datasets, and also hopes to provide insights for future research.

Event GDR: Event-Centric Generative Document Retrieval

TL;DR

Event GDR tackles two core issues in generative document retrieval by integrating event-centric knowledge: (1) modeling document inner-content through enriched event relations, and (2) constructing semantically structured identifiers via a hierarchical event taxonomy. It introduces an exchange-then-reflection (ExR) multi-agent knowledge extraction pipeline, uses Event Representations (EReps) and Event-Relation Representations (ERReps) to capture content and coherence, and builds identifiers with Event Identifiers (EIds) or Taxonomy-based Identifiers (ETIds). The training combines multi-task objectives over events, relations, and queries to enable joint indexing and retrieval. Empirical results on English NQ and Chinese DuReader show significant gains over baselines, with strong generalization across languages and taxonomies, suggesting practical impact for scalable, structured GDR systems.

Abstract

Generative document retrieval, an emerging paradigm in information retrieval, learns to build connections between documents and identifiers within a single model, garnering significant attention. However, there are still two challenges: (1) neglecting inner-content correlation during document representation; (2) lacking explicit semantic structure during identifier construction. Nonetheless, events have enriched relations and well-defined taxonomy, which could facilitate addressing the above two challenges. Inspired by this, we propose Event GDR, an event-centric generative document retrieval model, integrating event knowledge into this task. Specifically, we utilize an exchange-then-reflection method based on multi-agents for event knowledge extraction. For document representation, we employ events and relations to model the document to guarantee the comprehensiveness and inner-content correlation. For identifier construction, we map the events to well-defined event taxonomy to construct the identifiers with explicit semantic structure. Our method achieves significant improvement over the baselines on two datasets, and also hopes to provide insights for future research.
Paper Structure (13 sections, 4 equations, 4 figures, 1 table)

This paper contains 13 sections, 4 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Basic GDR (top) and Event-Centric GDR (bottom).
  • Figure 2: Event/Relation Extraction Process.
  • Figure 3:
  • Figure 5: