We can still parse using syntactic rules

Ghaly Hussein

We can still parse using syntactic rules

Ghaly Hussein

TL;DR

The paper revisits explicit syntactic parsing in the era of transformers by proposing a GPSG/HPSG-inspired parsing framework that yields both dependency and constituency structures while handling noise and incomplete input. It presents an incremental, rule-based parser with tokenization, probabilistic POS tagging, phrase creation/indexing/scanning, phrase projection, and a Dijkstra-like connecting and reranking component, exported as both dependency and constituency outputs. Evaluation on UD English treebanks shows average dev/test UAS around $54.5\%$ and $53.8\%$ respectively, with higher performance (≈$54.5\%$ and $53.8\%$) achieved using a larger rule set, and qualitative demonstrations of dual-structure outputs and slash-feature handling. The work demonstrates a path to extend to new languages via language-specific POS inventories and rules, contributing to explainable NLP by integrating longstanding syntactic theories with computational parsing, albeit currently English-only and reliant on handcrafted rules.

Abstract

This research introduces a new parsing approach, based on earlier syntactic work on context free grammar (CFG) and generalized phrase structure grammar (GPSG). The approach comprises both a new parsing algorithm and a set of syntactic rules and features that overcome the limitations of CFG. It also generates both dependency and constituency parse trees, while accommodating noise and incomplete parses. The system was tested on data from Universal Dependencies, showing a promising average Unlabeled Attachment Score (UAS) of 54.5% in the development dataset (7 corpora) and 53.8% in the test set (12 corpora). The system also provides multiple parse hypotheses, allowing further reranking to improve parsing accuracy. This approach also leverages much of the theoretical syntactic work since the 1950s to be used within a computational context. The application of this approach provides a transparent and interpretable NLP model to process language input.

We can still parse using syntactic rules

TL;DR

and

respectively, with higher performance (≈

and

) achieved using a larger rule set, and qualitative demonstrations of dual-structure outputs and slash-feature handling. The work demonstrates a path to extend to new languages via language-specific POS inventories and rules, contributing to explainable NLP by integrating longstanding syntactic theories with computational parsing, albeit currently English-only and reliant on handcrafted rules.

Abstract

Paper Structure (21 sections, 4 figures)

This paper contains 21 sections, 4 figures.

Introduction
Related Work
Method
Parsing rules
Parsing Algorithm
Tokenization
POS Tagging
Phrase Creation
Phrase Indexing
Rule Scanning
Adjacent Phrase Scanning
Phrase Projection
Connecting and Reranking
Output and Export
Experimental Setup
...and 6 more sections

Figures (4)

Figure 1: Dependency vs. Constituency Structures (From martin_vector_2019
Figure 2: Parser Performance on different UD datasets, against spaCy parser performance
Figure 3: Illustration of parser output with the combined dependency and constituency structures
Figure 4: Illustration of parser output for applying GPSG slash features

We can still parse using syntactic rules

TL;DR

Abstract

We can still parse using syntactic rules

Authors

TL;DR

Abstract

Table of Contents

Figures (4)