Table of Contents
Fetching ...

Towards Robust Universal Information Extraction: Benchmark, Evaluation, and Solution

Jizhao Zhu, Akang Shi, Zixuan Li, Long Bai, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng

TL;DR

The paper tackles the robustness of Universal Information Extraction (UIE) by introducing RUIE-Bench, a benchmark that uses Large Language Models (LLMs) to generate diverse, realistic perturbations across NER, RE, and ED. It evaluates a wide range of UIE and traditional IE models, revealing substantial robustness gaps under perturbations and highlighting the generalization challenges of both open- and closed-source LLM-based systems. To address this, the authors propose Loss-guided Data Augmentation (LDA), which iteratively selects hard augmented samples based on inference loss, achieving 7.5% relative improvement on RUIE-Bench with only 15% of augmented data and 8.9% improvement on unseen data. The work provides a robust, cost-efficient framework for evaluating and improving UIE systems, with practical implications for deploying more reliable UIE solutions in real-world settings.

Abstract

In this paper, we aim to enhance the robustness of Universal Information Extraction (UIE) by introducing a new benchmark dataset, a comprehensive evaluation, and a feasible solution. Existing robust benchmark datasets have two key limitations: 1) They generate only a limited range of perturbations for a single Information Extraction (IE) task, which fails to evaluate the robustness of UIE models effectively; 2) They rely on small models or handcrafted rules to generate perturbations, often resulting in unnatural adversarial examples. Considering the powerful generation capabilities of Large Language Models (LLMs), we introduce a new benchmark dataset for Robust UIE, called RUIE-Bench, which utilizes LLMs to generate more diverse and realistic perturbations across different IE tasks. Based on this dataset, we comprehensively evaluate existing UIE models and reveal that both LLM-based models and other models suffer from significant performance drops. To improve robustness and reduce training costs, we propose a data-augmentation solution that dynamically selects hard samples for iterative training based on the model's inference loss. Experimental results show that training with only \textbf{15\%} of the data leads to an average \textbf{7.5\%} relative performance improvement across three IE tasks.

Towards Robust Universal Information Extraction: Benchmark, Evaluation, and Solution

TL;DR

The paper tackles the robustness of Universal Information Extraction (UIE) by introducing RUIE-Bench, a benchmark that uses Large Language Models (LLMs) to generate diverse, realistic perturbations across NER, RE, and ED. It evaluates a wide range of UIE and traditional IE models, revealing substantial robustness gaps under perturbations and highlighting the generalization challenges of both open- and closed-source LLM-based systems. To address this, the authors propose Loss-guided Data Augmentation (LDA), which iteratively selects hard augmented samples based on inference loss, achieving 7.5% relative improvement on RUIE-Bench with only 15% of augmented data and 8.9% improvement on unseen data. The work provides a robust, cost-efficient framework for evaluating and improving UIE systems, with practical implications for deploying more reliable UIE solutions in real-world settings.

Abstract

In this paper, we aim to enhance the robustness of Universal Information Extraction (UIE) by introducing a new benchmark dataset, a comprehensive evaluation, and a feasible solution. Existing robust benchmark datasets have two key limitations: 1) They generate only a limited range of perturbations for a single Information Extraction (IE) task, which fails to evaluate the robustness of UIE models effectively; 2) They rely on small models or handcrafted rules to generate perturbations, often resulting in unnatural adversarial examples. Considering the powerful generation capabilities of Large Language Models (LLMs), we introduce a new benchmark dataset for Robust UIE, called RUIE-Bench, which utilizes LLMs to generate more diverse and realistic perturbations across different IE tasks. Based on this dataset, we comprehensively evaluate existing UIE models and reveal that both LLM-based models and other models suffer from significant performance drops. To improve robustness and reduce training costs, we propose a data-augmentation solution that dynamically selects hard samples for iterative training based on the model's inference loss. Experimental results show that training with only \textbf{15\%} of the data leads to an average \textbf{7.5\%} relative performance improvement across three IE tasks.

Paper Structure

This paper contains 26 sections, 6 figures, 9 tables, 1 algorithm.

Figures (6)

  • Figure 1: Illustration of generated adversarial examples with different kinds of perturbations.
  • Figure 2: Performance comparison of different models under various perturbations on different datasets. Red and blue indicate performance drop and improvement, respectively . KC is short for the KnowCoder model.
  • Figure 3: Prompts for Replace Entity, Triple, and Trigger.
  • Figure 4: Prompts for Change Context.
  • Figure 5: Prompts for Extend Sentence.
  • ...and 1 more figures