Table of Contents
Fetching ...

A Simple but Effective Approach to Improve Structured Language Model Output for Information Extraction

Yinghao Li, Rampi Ramprasad, Chao Zhang

TL;DR

This paper addresses the challenge of producing structured outputs from large language models for information extraction. It introduces Generate and Organize (G&O), a two-stage prompting approach that first generates free-form NL content and then structures it into a predefined format with a cleanup step. Across diverse NER and RE tasks and multiple LLMs, G&O yields significant zero-shot improvements over conventional one-step prompts, with ablation analyses confirming the contribution of each component. The method is flexible and can be combined with strategies like self-consistency, and the authors provide public code to support reproducibility and further research.

Abstract

Large language models (LLMs) have demonstrated impressive abilities in generating unstructured natural language according to instructions. However, their performance can be inconsistent when tasked with producing text that adheres to specific structured formats, which is crucial in applications like named entity recognition (NER) or relation extraction (RE). To address this issue, this paper introduces an efficient method, G&O, to enhance their structured text generation capabilities. It breaks the generation into a two-step pipeline: initially, LLMs generate answers in natural language as intermediate responses. Subsequently, LLMs are asked to organize the output into the desired structure, using the intermediate responses as context. G&O effectively separates the generation of content from the structuring process, reducing the pressure of completing two orthogonal tasks simultaneously. Tested on zero-shot NER and RE, the results indicate a significant improvement in LLM performance with minimal additional efforts. This straightforward and adaptable prompting technique can also be combined with other strategies, like self-consistency, to further elevate LLM capabilities in various structured text generation tasks.

A Simple but Effective Approach to Improve Structured Language Model Output for Information Extraction

TL;DR

This paper addresses the challenge of producing structured outputs from large language models for information extraction. It introduces Generate and Organize (G&O), a two-stage prompting approach that first generates free-form NL content and then structures it into a predefined format with a cleanup step. Across diverse NER and RE tasks and multiple LLMs, G&O yields significant zero-shot improvements over conventional one-step prompts, with ablation analyses confirming the contribution of each component. The method is flexible and can be combined with strategies like self-consistency, and the authors provide public code to support reproducibility and further research.

Abstract

Large language models (LLMs) have demonstrated impressive abilities in generating unstructured natural language according to instructions. However, their performance can be inconsistent when tasked with producing text that adheres to specific structured formats, which is crucial in applications like named entity recognition (NER) or relation extraction (RE). To address this issue, this paper introduces an efficient method, G&O, to enhance their structured text generation capabilities. It breaks the generation into a two-step pipeline: initially, LLMs generate answers in natural language as intermediate responses. Subsequently, LLMs are asked to organize the output into the desired structure, using the intermediate responses as context. G&O effectively separates the generation of content from the structuring process, reducing the pressure of completing two orthogonal tasks simultaneously. Tested on zero-shot NER and RE, the results indicate a significant improvement in LLM performance with minimal additional efforts. This straightforward and adaptable prompting technique can also be combined with other strategies, like self-consistency, to further elevate LLM capabilities in various structured text generation tasks.
Paper Structure (24 sections, 5 figures, 5 tables)

This paper contains 24 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: The pipeline of G&O for NER, compared with Traditional One-Step prompting methods.
  • Figure 2: GPT-3.5's natural language responses tend to include irrelevant entities (marked by red). Although clearly explained, irrelevant terms still pose a difficulty for GPT-3.5 during format organization.
  • Figure 3: Comparing the precision and recall of G&O-NER with One-Step on NER datasets.
  • Figure 4: F1 scores of differnt LMs with G&O and One-Step promptings, macro-averaged on the all datasets.
  • Figure 5: The F1 scores of GPT-3.5 with different prompting approaches on RE datasets.