Table of Contents
Fetching ...

Self-Correction Makes LLMs Better Parsers

Ziyan Zhang, Yang Hou, Chen Gong, Zhenghua Li

TL;DR

This work analyzes the limitations of LLM-based constituency parsing, revealing that LLMs struggle to fully utilize grammar rules from treebanks due to few-shot constraints. It introduces a training-free self-correction framework that uses existing treebank rules to guide LLM corrections via unmatch and structure correction, including rule-based error processing and example-guided prompts. Across PTB, CTB5, and MCTB (English and Chinese), the method delivers substantial in-domain and cross-domain gains across multiple LLMs, notably boosting recall and mitigating overly flat parses. The results demonstrate that LLMs can acquire structural knowledge from treebanks through guided self-correction, enhancing parsing robustness without additional training.

Abstract

Large language models (LLMs) have achieved remarkable success across various natural language processing (NLP) tasks. However, recent studies suggest that they still face challenges in performing fundamental NLP tasks essential for deep language understanding, particularly syntactic parsing. In this paper, we conduct an in-depth analysis of LLM parsing capabilities, delving into the specific shortcomings of their parsing results. We find that LLMs may stem from limitations to fully leverage grammar rules in existing treebanks, which restricts their capability to generate valid syntactic structures. To help LLMs acquire knowledge without additional training, we propose a self-correction method that leverages grammar rules from existing treebanks to guide LLMs in correcting previous errors. Specifically, we automatically detect potential errors and dynamically search for relevant rules, offering hints and examples to guide LLMs in making corrections themselves. Experimental results on three datasets with various LLMs, demonstrate that our method significantly improves performance in both in-domain and cross-domain settings on the English and Chinese datasets.

Self-Correction Makes LLMs Better Parsers

TL;DR

This work analyzes the limitations of LLM-based constituency parsing, revealing that LLMs struggle to fully utilize grammar rules from treebanks due to few-shot constraints. It introduces a training-free self-correction framework that uses existing treebank rules to guide LLM corrections via unmatch and structure correction, including rule-based error processing and example-guided prompts. Across PTB, CTB5, and MCTB (English and Chinese), the method delivers substantial in-domain and cross-domain gains across multiple LLMs, notably boosting recall and mitigating overly flat parses. The results demonstrate that LLMs can acquire structural knowledge from treebanks through guided self-correction, enhancing parsing robustness without additional training.

Abstract

Large language models (LLMs) have achieved remarkable success across various natural language processing (NLP) tasks. However, recent studies suggest that they still face challenges in performing fundamental NLP tasks essential for deep language understanding, particularly syntactic parsing. In this paper, we conduct an in-depth analysis of LLM parsing capabilities, delving into the specific shortcomings of their parsing results. We find that LLMs may stem from limitations to fully leverage grammar rules in existing treebanks, which restricts their capability to generate valid syntactic structures. To help LLMs acquire knowledge without additional training, we propose a self-correction method that leverages grammar rules from existing treebanks to guide LLMs in correcting previous errors. Specifically, we automatically detect potential errors and dynamically search for relevant rules, offering hints and examples to guide LLMs in making corrections themselves. Experimental results on three datasets with various LLMs, demonstrate that our method significantly improves performance in both in-domain and cross-domain settings on the English and Chinese datasets.

Paper Structure

This paper contains 32 sections, 7 figures, 10 tables.

Figures (7)

  • Figure 1: Four types of errors.
  • Figure 2: The overview of the four types of errors made by different models on the PTB.
  • Figure 3: The process of structure correction. For clarify, we provide a typical example of rule for each processing method in the figure. In practice, each rule needs to undergo three types of processing methods and be ranked.
  • Figure 4: The effect of unmatch correction and structure correction with GPT-4 on the PTB dataset. "unmatch" represents unmatch correction and "h" represents the height of subtrees in the structure correction.
  • Figure 5: Rule statistics of the parsing results generated from GPT-4 before and after applying our self-correction method on the PTB dataset.
  • ...and 2 more figures