Multilingual Nonce Dependency Treebanks: Understanding how Language Models represent and process syntactic structure

David Arps; Laura Kallmeyer; Younes Samih; Hassan Sajjad

Multilingual Nonce Dependency Treebanks: Understanding how Language Models represent and process syntactic structure

David Arps, Laura Kallmeyer, Younes Samih, Hassan Sajjad

TL;DR

This work proposes Semantically Perturbed Universal Dependencies SPUD to create grammatically valid nonce treebanks that perturb lexical co occurrence while preserving syntactic structure. SPUD is applied to five languages and evaluated on two fronts: language model scoring with autoregressive and masked models and dependency probing with the DepProbe framework. The findings show that autoregressive perplexity is more sensitive to nonce content than masked pseudo perplexities, and that subword information mitigates some effects for MLMs; probing results reveal a robust amount of syntactic information despite semantic perturbations, with attachment predictions more affected than relation labeling. Overall, SPUD provides a principled way to study the interplay of syntax and semantics in multilingual LMs and offers data resources and a tutorial to support further research in this area.

Abstract

We introduce SPUD (Semantically Perturbed Universal Dependencies), a framework for creating nonce treebanks for the multilingual Universal Dependencies (UD) corpora. SPUD data satisfies syntactic argument structure, provides syntactic annotations, and ensures grammaticality via language-specific rules. We create nonce data in Arabic, English, French, German, and Russian, and demonstrate two use cases of SPUD treebanks. First, we investigate the effect of nonce data on word co-occurrence statistics, as measured by perplexity scores of autoregressive (ALM) and masked language models (MLM). We find that ALM scores are significantly more affected by nonce data than MLM scores. Second, we show how nonce data affects the performance of syntactic dependency probes. We replicate the findings of Müller-Eberstein et al. (2022) on nonce test data and show that the performance declines on both MLMs and ALMs wrt. original test data. However, a majority of the performance is kept, suggesting that the probe indeed learns syntax independently from semantics.

Multilingual Nonce Dependency Treebanks: Understanding how Language Models represent and process syntactic structure

TL;DR

Abstract

Paper Structure (80 sections, 6 equations, 9 figures, 22 tables)

This paper contains 80 sections, 6 equations, 9 figures, 22 tables.

Introduction
Related Work
Automatically modified Dependency Trees
Structural Probing
Syntactic and Semantic Information in LMs
Scoring Functions for LMs
Nonce Treebanks for Five Languages
Generating Nonce Data
Language-independent algorithm
Language-specific modifications
Quality of the generated data
Human evaluation
Scoring SPUD with ALMs and MLMs
Scoring Functions for LMs
ALMs: Perplexity ($PPL$)
...and 65 more sections

Figures (9)

Figure 1: SPUD data creation
Figure 2: Intrinsic evaluation results for English and Arabic. Plots for other languages are in App. \ref{['app:score-ratios']}.
Figure 3: Arabic SPUD examples (in transliteration).
Figure 4: German SPUD examples.
Figure 5: English SPUD examples.
...and 4 more figures

Multilingual Nonce Dependency Treebanks: Understanding how Language Models represent and process syntactic structure

TL;DR

Abstract

Multilingual Nonce Dependency Treebanks: Understanding how Language Models represent and process syntactic structure

Authors

TL;DR

Abstract

Table of Contents

Figures (9)