Table of Contents
Fetching ...

We can still parse using syntactic rules

Ghaly Hussein

TL;DR

The paper revisits explicit syntactic parsing in the era of transformers by proposing a GPSG/HPSG-inspired parsing framework that yields both dependency and constituency structures while handling noise and incomplete input. It presents an incremental, rule-based parser with tokenization, probabilistic POS tagging, phrase creation/indexing/scanning, phrase projection, and a Dijkstra-like connecting and reranking component, exported as both dependency and constituency outputs. Evaluation on UD English treebanks shows average dev/test UAS around $54.5\%$ and $53.8\%$ respectively, with higher performance (≈$54.5\%$ and $53.8\%$) achieved using a larger rule set, and qualitative demonstrations of dual-structure outputs and slash-feature handling. The work demonstrates a path to extend to new languages via language-specific POS inventories and rules, contributing to explainable NLP by integrating longstanding syntactic theories with computational parsing, albeit currently English-only and reliant on handcrafted rules.

Abstract

This research introduces a new parsing approach, based on earlier syntactic work on context free grammar (CFG) and generalized phrase structure grammar (GPSG). The approach comprises both a new parsing algorithm and a set of syntactic rules and features that overcome the limitations of CFG. It also generates both dependency and constituency parse trees, while accommodating noise and incomplete parses. The system was tested on data from Universal Dependencies, showing a promising average Unlabeled Attachment Score (UAS) of 54.5% in the development dataset (7 corpora) and 53.8% in the test set (12 corpora). The system also provides multiple parse hypotheses, allowing further reranking to improve parsing accuracy. This approach also leverages much of the theoretical syntactic work since the 1950s to be used within a computational context. The application of this approach provides a transparent and interpretable NLP model to process language input.

We can still parse using syntactic rules

TL;DR

The paper revisits explicit syntactic parsing in the era of transformers by proposing a GPSG/HPSG-inspired parsing framework that yields both dependency and constituency structures while handling noise and incomplete input. It presents an incremental, rule-based parser with tokenization, probabilistic POS tagging, phrase creation/indexing/scanning, phrase projection, and a Dijkstra-like connecting and reranking component, exported as both dependency and constituency outputs. Evaluation on UD English treebanks shows average dev/test UAS around and respectively, with higher performance (≈ and ) achieved using a larger rule set, and qualitative demonstrations of dual-structure outputs and slash-feature handling. The work demonstrates a path to extend to new languages via language-specific POS inventories and rules, contributing to explainable NLP by integrating longstanding syntactic theories with computational parsing, albeit currently English-only and reliant on handcrafted rules.

Abstract

This research introduces a new parsing approach, based on earlier syntactic work on context free grammar (CFG) and generalized phrase structure grammar (GPSG). The approach comprises both a new parsing algorithm and a set of syntactic rules and features that overcome the limitations of CFG. It also generates both dependency and constituency parse trees, while accommodating noise and incomplete parses. The system was tested on data from Universal Dependencies, showing a promising average Unlabeled Attachment Score (UAS) of 54.5% in the development dataset (7 corpora) and 53.8% in the test set (12 corpora). The system also provides multiple parse hypotheses, allowing further reranking to improve parsing accuracy. This approach also leverages much of the theoretical syntactic work since the 1950s to be used within a computational context. The application of this approach provides a transparent and interpretable NLP model to process language input.
Paper Structure (21 sections, 4 figures)

This paper contains 21 sections, 4 figures.

Figures (4)

  • Figure 1: Dependency vs. Constituency Structures (From martin_vector_2019
  • Figure 2: Parser Performance on different UD datasets, against spaCy parser performance
  • Figure 3: Illustration of parser output with the combined dependency and constituency structures
  • Figure 4: Illustration of parser output for applying GPSG slash features