Table of Contents
Fetching ...

Label Drop for Multi-Aspect Relation Modeling in Universal Information Extraction

Lu Yang, Jiajia Li, En Ci, Lefei Zhang, Zuchao Li, Ping Wang

TL;DR

LDNet addresses universal information extraction by enabling simultaneous multi-relational extraction with three relation types $TA$, $A2A$, and $AS$ through multi-aspect relation modeling and a label-drop mechanism. It fuses text and image representations with RoPE-based global pointers to generate relation-specific score matrices $S^r$ and uses model transfer learning to propagate knowledge across datasets, optimizing with a combined loss $L = L_{MR} + L_{LD} + L_{MT}$. The approach achieves state-of-the-art or competitive results on 9 tasks and 33 datasets across single-modal and multi-modal settings, including few-shot and zero-shot regimes, even with smaller pretrained backbones. Limitations include data scarcity for extensive multi-modal pre-training and potential document-level extraction challenges, accompanied by ethical considerations for privacy and data usage.

Abstract

Universal Information Extraction (UIE) has garnered significant attention due to its ability to address model explosion problems effectively. Extractive UIE can achieve strong performance using a relatively small model, making it widely adopted. Extractive UIEs generally rely on task instructions for different tasks, including single-target instructions and multiple-target instructions. Single-target instruction UIE enables the extraction of only one type of relation at a time, limiting its ability to model correlations between relations and thus restricting its capability to extract complex relations. While multiple-target instruction UIE allows for the extraction of multiple relations simultaneously, the inclusion of irrelevant relations introduces decision complexity and impacts extraction accuracy. Therefore, for multi-relation extraction, we propose LDNet, which incorporates multi-aspect relation modeling and a label drop mechanism. By assigning different relations to different levels for understanding and decision-making, we reduce decision confusion. Additionally, the label drop mechanism effectively mitigates the impact of irrelevant relations. Experiments show that LDNet outperforms or achieves competitive performance with state-of-the-art systems on 9 tasks, 33 datasets, in both single-modal and multi-modal, few-shot and zero-shot settings.\footnote{https://github.com/Lu-Yang666/LDNet}

Label Drop for Multi-Aspect Relation Modeling in Universal Information Extraction

TL;DR

LDNet addresses universal information extraction by enabling simultaneous multi-relational extraction with three relation types , , and through multi-aspect relation modeling and a label-drop mechanism. It fuses text and image representations with RoPE-based global pointers to generate relation-specific score matrices and uses model transfer learning to propagate knowledge across datasets, optimizing with a combined loss . The approach achieves state-of-the-art or competitive results on 9 tasks and 33 datasets across single-modal and multi-modal settings, including few-shot and zero-shot regimes, even with smaller pretrained backbones. Limitations include data scarcity for extensive multi-modal pre-training and potential document-level extraction challenges, accompanied by ethical considerations for privacy and data usage.

Abstract

Universal Information Extraction (UIE) has garnered significant attention due to its ability to address model explosion problems effectively. Extractive UIE can achieve strong performance using a relatively small model, making it widely adopted. Extractive UIEs generally rely on task instructions for different tasks, including single-target instructions and multiple-target instructions. Single-target instruction UIE enables the extraction of only one type of relation at a time, limiting its ability to model correlations between relations and thus restricting its capability to extract complex relations. While multiple-target instruction UIE allows for the extraction of multiple relations simultaneously, the inclusion of irrelevant relations introduces decision complexity and impacts extraction accuracy. Therefore, for multi-relation extraction, we propose LDNet, which incorporates multi-aspect relation modeling and a label drop mechanism. By assigning different relations to different levels for understanding and decision-making, we reduce decision confusion. Additionally, the label drop mechanism effectively mitigates the impact of irrelevant relations. Experiments show that LDNet outperforms or achieves competitive performance with state-of-the-art systems on 9 tasks, 33 datasets, in both single-modal and multi-modal, few-shot and zero-shot settings.\footnote{https://github.com/Lu-Yang666/LDNet}

Paper Structure

This paper contains 37 sections, 19 equations, 3 figures, 17 tables, 1 algorithm.

Figures (3)

  • Figure 1: The overview framework of LDNet. LDNet constructs a unified input format, which combines instruction, schema labels, and text. The representation obtained from the PLM is fused with image representation obtained with the image backbone. The multi-modal representation is fed into the multi-aspect relation modeling component to produce probability matrices for TA, A2A, and AS relations, respectively. These matrices are then subjected to label drop to mask out non-existent relations. Finally, the probability matrices are fed into the decoding process to generate target structures.
  • Figure 2: Results of the ablation study on the label drop mechanism. Table on the left shows LDNet's performance with and without the Label Drop mechanism, box plot in the middle and line chart on the right illustrate LDNet's performance under different drop rates.
  • Figure 3: Results of the ablation study on the label drop mechanism on MIE tasks.