Retrieval Augmented Instruction Tuning for Open NER with Large Language Models

Tingyu Xie; Jian Zhang; Yan Zhang; Yuanyuan Liang; Qi Li; Hongwei Wang

Retrieval Augmented Instruction Tuning for Open NER with Large Language Models

Tingyu Xie, Jian Zhang, Yan Zhang, Yuanyuan Liang, Qi Li, Hongwei Wang

TL;DR

This work investigates how to best incorporate information into large language models for information extraction, focusing on open-domain NER. It introduces Retrieval Augmented Instruction Tuning (RA-IT), which augments training inputs with semantically similar retrieved examples to create context-rich instructions, and evaluates this approach in English and Chinese using a new Sky-NER dataset. Across data sizes, backbones, and benchmarks, RA-IT yields consistent improvements over vanilla instruction tuning, with analyses highlighting the benefits of semantically similar retrieval, the value of in-domain examples for inference, and the utility of BM25 filtering when using out-domain contexts. The study provides practical insights for context-enhanced fine-tuning in IE and releases code and Chinese IT data to promote further research and application.

Abstract

The strong capability of large language models (LLMs) has been applied to information extraction (IE) through either retrieval augmented prompting or instruction tuning (IT). However, the best way to incorporate information with LLMs for IE remains an open question. In this paper, we explore Retrieval Augmented Instruction Tuning (RA-IT) for IE, focusing on the task of open named entity recognition (NER). Specifically, for each training sample, we retrieve semantically similar examples from the training dataset as the context and prepend them to the input of the original instruction. To evaluate our RA-IT approach more thoroughly, we construct a Chinese IT dataset for open NER and evaluate RA-IT in both English and Chinese scenarios. Experimental results verify the effectiveness of RA-IT across various data sizes and in both English and Chinese scenarios. We also conduct thorough studies to explore the impacts of various retrieval strategies in the proposed RA-IT framework. Code and data are available at: https://github.com/Emma1066/Retrieval-Augmented-IT-OpenNER

Retrieval Augmented Instruction Tuning for Open NER with Large Language Models

TL;DR

Abstract

Paper Structure (25 sections, 6 figures, 15 tables)

This paper contains 25 sections, 6 figures, 15 tables.

Introduction
Method
Experiment
Experimental Settings
Chinese IT Data Construction
Preliminary Study on Data Efficiency
Main results
Analysis
Related Work
IE with LLMs
Retrieval aware Fine-Tuning
Conclusion
Chinese Data Construction
Diverse Teachers for Data Construction
More Details of Experimental Settings
...and 10 more sections

Figures (6)

Figure 1: The RA-IT template, where the retrieved context consists of semantically similar examples retrieved from the training dataset and is prepended to the original vanilla IT template. The vanilla IT template, presented by zhou2024universalner converts each NER sample into a conversation, where $X_{passage}$ is the input text, $[t1,\dots,t_T]$ are entity types to extract, and $y_i$ is the list of entity mentions that are $t_i$. The highlighted parts are used to compute the loss during training.
Figure 2: Preliminary study of IT data efficiency for open NER in English (left) and Chinese (right) scenarios, where the training data are Pile-NER and Sky-NER respectively. Average zero-shot results of evaluated benchmarks are illustrated. The performance does not necessarily improve as the data increases.
Figure 3: Impacts of training using various retrieval strategies in RA-IT. The average F1 value of the evaluated benchmarks is reported. NN exhibits the best performances, suggesting the need of training with retrieved context.
Figure 4: Impacts of inferece with out-domain examples using various retrieval strategies. The average F1 value of the evaluated benchmarks are reported. w/o exmp. means inference without example. Applying example filtering strategy such as BM25 filtering benefits RAG with out-domain examples.
Figure 5: Impacts of inference with in-domain examples using NN retrieval. The average F1 value of the evaluated benchmarks are reported. $N$-exmp. means the example pool of size $N$. The results indicate that sufficient in-domain examples are helpful for inference with RAG.
...and 1 more figures

Retrieval Augmented Instruction Tuning for Open NER with Large Language Models

TL;DR

Abstract

Retrieval Augmented Instruction Tuning for Open NER with Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)