From Partial to Strictly Incremental Constituent Parsing
Ana Ezquerro, Carlos Gómez-Rodríguez, David Vilares
TL;DR
This work investigates fully incremental constituent parsing using strictly left-to-right encoder–decoder architectures, where each word from the input prefix $w_1...w_i$ is added to a partial tree with a small lookahead $k\in\{0,1,2\}$. It evaluates two incremental decoding paradigms—a tagging-based approach and a transition-based approach with a graph neural network—while comparing incremental encoders, including multilingual LLMs (e.g., $mGPT$, $BLOOM$-560M) and a 4-layer LSTM baseline. The results show that encoder quality largely governs performance under strict incrementality; transition-based decoders perform better than tagging, and modest delays substantially improve accuracy, though gaps to non-incremental baselines remain, especially for lower-resource languages and in the absence of bidirectional encoding. The findings highlight encoder-centered bottlenecks and suggest directions toward real-time speculative decoding and broader multilingual evaluation to move parsing closer to human-like incremental processing.
Abstract
We study incremental constituent parsers to assess their capacity to output trees based on prefix representations alone. Guided by strictly left-to-right generative language models and tree-decoding modules, we build parsers that adhere to a strong definition of incrementality across languages. This builds upon work that asserted incrementality, but that mostly only enforced it on either the encoder or the decoder. Finally, we conduct an analysis against non-incremental and partially incremental models.
