Table of Contents
Fetching ...

GADePo: Graph-Assisted Declarative Pooling Transformers for Document-Level Relation Extraction

Andrei C. Coman, Christos Theodoropoulos, Marie-Francine Moens, James Henderson

TL;DR

Document-level relation extraction traditionally relies on text encoders plus hand-coded pooling. GADePo replaces rigid pooling with a graph-informed, learnable aggregation framework by introducing special tokens and relation embeddings, integrated into a joint text-graph Transformer. The approach demonstrates competitive or superior performance to ATLOP on Re-DocRED and HacRED, and shows stability and improved recall on challenging datasets, highlighting the benefits of explicit graph guidance in attention. This work offers a flexible, data-driven pooling paradigm that can be customized with domain knowledge and extended to evidence-based and memory-efficient RE setups.

Abstract

Document-level relation extraction typically relies on text-based encoders and hand-coded pooling heuristics to aggregate information learned by the encoder. In this paper, we leverage the intrinsic graph processing capabilities of the Transformer model and propose replacing hand-coded pooling methods with new tokens in the input, which are designed to aggregate information via explicit graph relations in the computation of attention weights. We introduce a joint text-graph Transformer model and a graph-assisted declarative pooling (GADePo) specification of the input, which provides explicit and high-level instructions for information aggregation. GADePo allows the pooling process to be guided by domain-specific knowledge or desired outcomes but still learned by the Transformer, leading to more flexible and customisable pooling strategies. We evaluate our method across diverse datasets and models and show that our approach yields promising results that are consistently better than those achieved by the hand-coded pooling functions.

GADePo: Graph-Assisted Declarative Pooling Transformers for Document-Level Relation Extraction

TL;DR

Document-level relation extraction traditionally relies on text encoders plus hand-coded pooling. GADePo replaces rigid pooling with a graph-informed, learnable aggregation framework by introducing special tokens and relation embeddings, integrated into a joint text-graph Transformer. The approach demonstrates competitive or superior performance to ATLOP on Re-DocRED and HacRED, and shows stability and improved recall on challenging datasets, highlighting the benefits of explicit graph guidance in attention. This work offers a flexible, data-driven pooling paradigm that can be customized with domain knowledge and extended to evidence-based and memory-efficient RE setups.

Abstract

Document-level relation extraction typically relies on text-based encoders and hand-coded pooling heuristics to aggregate information learned by the encoder. In this paper, we leverage the intrinsic graph processing capabilities of the Transformer model and propose replacing hand-coded pooling methods with new tokens in the input, which are designed to aggregate information via explicit graph relations in the computation of attention weights. We introduce a joint text-graph Transformer model and a graph-assisted declarative pooling (GADePo) specification of the input, which provides explicit and high-level instructions for information aggregation. GADePo allows the pooling process to be guided by domain-specific knowledge or desired outcomes but still learned by the Transformer, leading to more flexible and customisable pooling strategies. We evaluate our method across diverse datasets and models and show that our approach yields promising results that are consistently better than those achieved by the hand-coded pooling functions.
Paper Structure (33 sections, 15 equations, 5 figures, 7 tables)

This paper contains 33 sections, 15 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Document from the Re-DocRED tan-etal-2022-revisiting dataset involving multiple entities and labels. Subject entity Breakout (red) and object entity Atari (blue) express relations "developer" and "publisher". Other entities are indicated as Mention (white).
  • Figure 2: Comparison between the previous method ATLOP (left) and the proposed method GADePo (right), illustrating the document in Figure \ref{['fig:example']} containing two entities (red and blue), each with two mentions. In ATLOP, the mentions' encoder outputs are aggregated into entity representations $\bm{h}_e$, and the encoder's attention weights are used to identify which outputs to aggregate for entity-pair representations $\bm{c}^{(s,o)}$. In GADePo, the textual input is extended to include the graph special tokens <ent> for entity representations and <pent> for entity-pair representations, and explicit directional graph relations specify their associated mentions. A joint text-graph Transformer model is then used to encode this declarative pooling specification graph and compute the relevant aggregations.
  • Figure 3: Attention weights $\bm{A}$ from GADePo via Equation \ref{['eq:plm_encoding']} for the document in Figure \ref{['fig:example']}. For clarity, only a subset of <ent> and document tokens are shown on the $y$-axis (queries) and $x$-axis (keys), respectively.
  • Figure 4: Performance of ATLOP$^{\star}$ ($\bm{h}_e$ ; $\bm{c}^{(s,o)}$) and GADePo (<ent> ; <pent>) on the development set under varying data availability conditions on Re-DocRED (\ref{['subfig:redocred_f1']}) and HacRED (\ref{['subfig:hacred_f1']}). The $x$-axis represents the percentage and number of documents from the training dataset, while the $y$-axis displays the $F_1$ score in percentage. Each point on the graph represents the mean value, while error bars indicate the standard deviation derived from five distinct training runs with separate random seeds.
  • Figure 5: Performance of ATLOP$^{\star}$ ($\bm{h}_e$ ; $\bm{c}^{(s,o)}$) and GADePo (<ent> ; <pent>) on the development set under varying data availability conditions on DocRED. The $x$-axis represents the percentage and number of documents from the training dataset, while the $y$-axis displays the $F_1$ score in percentage. Each point on the graph represents the mean value, while error bars indicate the standard deviation derived from five distinct training runs with separate random seeds.