Large Language Models for Generative Information Extraction: A Survey
Derong Xu, Wei Chen, Wenjun Peng, Chao Zhang, Tong Xu, Xiangyu Zhao, Xian Wu, Yefeng Zheng, Yang Wang, Enhong Chen
TL;DR
This survey analyzes the rapid emergence of generative information extraction (IE) using large language models (LLMs). It introduces dual taxonomies—IE subtasks (NER/RE/EE/Unified IE) and LLM-centric techniques (data augmentation, prompting, fine-tuning, decoding constraints)—and contrasts natural-language versus code-based universal IE frameworks. Empirically, universal and prompt-aware approaches often outperform task-specific discriminative models, especially in complex or cross-domain scenarios, while data-efficient strategies remain critical for low-resource domains. The work also surveys domain-specific applications, evaluation insights, and directions toward scalable, robust, and Open IE-enabled IE with LLMs.
Abstract
Information extraction (IE) aims to extract structural knowledge from plain natural language texts. Recently, generative Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation. As a result, numerous works have been proposed to integrate LLMs for IE tasks based on a generative paradigm. To conduct a comprehensive systematic review and exploration of LLM efforts for IE tasks, in this study, we survey the most recent advancements in this field. We first present an extensive overview by categorizing these works in terms of various IE subtasks and techniques, and then we empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs. Based on a thorough review conducted, we identify several insights in technique and promising research directions that deserve further exploration in future studies. We maintain a public repository and consistently update related works and resources on GitHub (\href{https://github.com/quqxui/Awesome-LLM4IE-Papers}{LLM4IE repository})
