Self-Correction Makes LLMs Better Parsers
Ziyan Zhang, Yang Hou, Chen Gong, Zhenghua Li
TL;DR
This work analyzes the limitations of LLM-based constituency parsing, revealing that LLMs struggle to fully utilize grammar rules from treebanks due to few-shot constraints. It introduces a training-free self-correction framework that uses existing treebank rules to guide LLM corrections via unmatch and structure correction, including rule-based error processing and example-guided prompts. Across PTB, CTB5, and MCTB (English and Chinese), the method delivers substantial in-domain and cross-domain gains across multiple LLMs, notably boosting recall and mitigating overly flat parses. The results demonstrate that LLMs can acquire structural knowledge from treebanks through guided self-correction, enhancing parsing robustness without additional training.
Abstract
Large language models (LLMs) have achieved remarkable success across various natural language processing (NLP) tasks. However, recent studies suggest that they still face challenges in performing fundamental NLP tasks essential for deep language understanding, particularly syntactic parsing. In this paper, we conduct an in-depth analysis of LLM parsing capabilities, delving into the specific shortcomings of their parsing results. We find that LLMs may stem from limitations to fully leverage grammar rules in existing treebanks, which restricts their capability to generate valid syntactic structures. To help LLMs acquire knowledge without additional training, we propose a self-correction method that leverages grammar rules from existing treebanks to guide LLMs in correcting previous errors. Specifically, we automatically detect potential errors and dynamically search for relevant rules, offering hints and examples to guide LLMs in making corrections themselves. Experimental results on three datasets with various LLMs, demonstrate that our method significantly improves performance in both in-domain and cross-domain settings on the English and Chinese datasets.
