Table of Contents
Fetching ...

Pointing out the Shortcomings of Relation Extraction Models with Semantically Motivated Adversarials

Gennaro Nolano, Moritz Blum, Basil Ell, Philipp Cimiano

TL;DR

This paper investigates robustness gaps in state-of-the-art relation extraction (RE) models by generating semantically motivated adversarial examples that substitute entity mentions while preserving the expressed relation. It constructs 12 adversarial datasets via four substitution strategies and evaluates multiple SOTA RE models on TACRED to measure resilience. The results reveal a substantial average F1 decline of $-48.5\%$ across models, with masking substitutions and surface-form reliance driving the drops, challenging the assumption that entity types alone guide RE. The work highlights the need for robust training and cross-dataset evaluation to mitigate reliance on superficial cues and to improve generalization in RE systems.

Abstract

In recent years, large language models have achieved state-of-the-art performance across various NLP tasks. However, investigations have shown that these models tend to rely on shortcut features, leading to inaccurate predictions and causing the models to be unreliable at generalization to out-of-distribution (OOD) samples. For instance, in the context of relation extraction (RE), we would expect a model to identify the same relation independently of the entities involved in it. For example, consider the sentence "Leonardo da Vinci painted the Mona Lisa" expressing the created(Leonardo_da_Vinci, Mona_Lisa) relation. If we substiute "Leonardo da Vinci" with "Barack Obama", then the sentence still expresses the created relation. A robust model is supposed to detect the same relation in both cases. In this work, we describe several semantically-motivated strategies to generate adversarial examples by replacing entity mentions and investigate how state-of-the-art RE models perform under pressure. Our analyses show that the performance of these models significantly deteriorates on the modified datasets (avg. of -48.5% in F1), which indicates that these models rely to a great extent on shortcuts, such as surface forms (or patterns therein) of entities, without making full use of the information present in the sentences.

Pointing out the Shortcomings of Relation Extraction Models with Semantically Motivated Adversarials

TL;DR

This paper investigates robustness gaps in state-of-the-art relation extraction (RE) models by generating semantically motivated adversarial examples that substitute entity mentions while preserving the expressed relation. It constructs 12 adversarial datasets via four substitution strategies and evaluates multiple SOTA RE models on TACRED to measure resilience. The results reveal a substantial average F1 decline of across models, with masking substitutions and surface-form reliance driving the drops, challenging the assumption that entity types alone guide RE. The work highlights the need for robust training and cross-dataset evaluation to mitigate reliance on superficial cues and to improve generalization in RE systems.

Abstract

In recent years, large language models have achieved state-of-the-art performance across various NLP tasks. However, investigations have shown that these models tend to rely on shortcut features, leading to inaccurate predictions and causing the models to be unreliable at generalization to out-of-distribution (OOD) samples. For instance, in the context of relation extraction (RE), we would expect a model to identify the same relation independently of the entities involved in it. For example, consider the sentence "Leonardo da Vinci painted the Mona Lisa" expressing the created(Leonardo_da_Vinci, Mona_Lisa) relation. If we substiute "Leonardo da Vinci" with "Barack Obama", then the sentence still expresses the created relation. A robust model is supposed to detect the same relation in both cases. In this work, we describe several semantically-motivated strategies to generate adversarial examples by replacing entity mentions and investigate how state-of-the-art RE models perform under pressure. Our analyses show that the performance of these models significantly deteriorates on the modified datasets (avg. of -48.5% in F1), which indicates that these models rely to a great extent on shortcuts, such as surface forms (or patterns therein) of entities, without making full use of the information present in the sentences.
Paper Structure (14 sections, 5 figures, 4 tables)

This paper contains 14 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Sets of mutually confusable relations.
  • Figure 2: Percentage of predictions within the same set of relations for LUKE.
  • Figure 3: Comparison: LUKE's predictions on the standard test set vs. the predictions on the test set following the diff.-type object substitution strategy for selected relations.
  • Figure 4: Percentages of predicted relations adhering to the type constraints posed by the adversarial examples.
  • Figure 5: Jaccard similarity between groups of similar relations