Table of Contents
Fetching ...

ERAS: Evaluating the Robustness of Chinese NLP Models to Morphological Garden Path Errors

Qinchan Li, Sophie Hao

TL;DR

It is shown that Chinese NLP models are vulnerable to morphological garden path errors: errors caused by a failure to resolve local word segmentation ambiguities using sentence-level morphosyntactic context, and that sentiment analysis models with character-level tokenization make implicit garden path errors, even without an explicit word segmentation step in the pipeline.

Abstract

In languages without orthographic word boundaries, NLP models perform word segmentation, either as an explicit preprocessing step or as an implicit step in an end-to-end computation. This paper shows that Chinese NLP models are vulnerable to morphological garden path errors: errors caused by a failure to resolve local word segmentation ambiguities using sentence-level morphosyntactic context. We propose a benchmark, ERAS, that tests a model's vulnerability to morphological garden path errors by comparing its behavior on sentences with and without local segmentation ambiguities. Using ERAS, we show that word segmentation models make garden path errors on locally ambiguous sentences, but do not make equivalent errors on unambiguous sentences. We further show that sentiment analysis models with character-level tokenization make implicit garden path errors, even without an explicit word segmentation step in the pipeline. Our results indicate that models' segmentation of Chinese text often fails to account for morphosyntactic context.

ERAS: Evaluating the Robustness of Chinese NLP Models to Morphological Garden Path Errors

TL;DR

It is shown that Chinese NLP models are vulnerable to morphological garden path errors: errors caused by a failure to resolve local word segmentation ambiguities using sentence-level morphosyntactic context, and that sentiment analysis models with character-level tokenization make implicit garden path errors, even without an explicit word segmentation step in the pipeline.

Abstract

In languages without orthographic word boundaries, NLP models perform word segmentation, either as an explicit preprocessing step or as an implicit step in an end-to-end computation. This paper shows that Chinese NLP models are vulnerable to morphological garden path errors: errors caused by a failure to resolve local word segmentation ambiguities using sentence-level morphosyntactic context. We propose a benchmark, ERAS, that tests a model's vulnerability to morphological garden path errors by comparing its behavior on sentences with and without local segmentation ambiguities. Using ERAS, we show that word segmentation models make garden path errors on locally ambiguous sentences, but do not make equivalent errors on unambiguous sentences. We further show that sentiment analysis models with character-level tokenization make implicit garden path errors, even without an explicit word segmentation step in the pipeline. Our results indicate that models' segmentation of Chinese text often fails to account for morphosyntactic context.

Paper Structure

This paper contains 40 sections, 1 equation, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Morphological garden paths involve local ambiguity at the level of word segmentation. In the sentence above, the bigram 留心 constitutes a valid word, but segmenting it as a word renders the sentence unparsable. A valid parse is obtained if the character 心 forms a word with the following character, 机.
  • Figure 2: ERAS consists of locally ambiguous test sentences paired with unambiguous control sentences. Test and control sentences differ in terms of a three-character test site, where test sentences contain a two-character canary word whose existence renders the sentence unparsable.
  • Figure 3: Examples of paradigms in ERAS. Each paradigm consists of two templates, one for test sentences and one for control sentences. Each paradigm also has a branching structure (left or right) and a true and canary sentiment value (+/, +/0, /0, or /+). Templates contain slots belonging to seven possible types: concept, entity, modifier, noun, object, person, or verb. The templates shown here contain one person slot and one noun slot.
  • Figure 4: Our occlusion study ablates canary words by masking out ("occluding") the character in the test site not belonging to the true word.
  • Figure 5: Test set accuracy is correlated with GPER, but exhibits no obvious relationship with ERAS accuracy.