Table of Contents
Fetching ...

PMB5: Gaining More Insight into Neural Semantic Parsing with Challenging Benchmarks

Xiao Zhang, Chunliu Wang, Rik van Noord, Johan Bos

TL;DR

Five neural models for semantic parsing and meaning-to-text generation are evaluated and it is shown that model performance declines (in some cases dramatically) on the challenge sets, revealing the limitations of neural models when confronting such challenges.

Abstract

The Parallel Meaning Bank (PMB) serves as a corpus for semantic processing with a focus on semantic parsing and text generation. Currently, we witness an excellent performance of neural parsers and generators on the PMB. This might suggest that such semantic processing tasks have by and large been solved. We argue that this is not the case and that performance scores from the past on the PMB are inflated by non-optimal data splits and test sets that are too easy. In response, we introduce several changes. First, instead of the prior random split, we propose a more systematic splitting approach to improve the reliability of the standard test data. Second, except for the standard test set, we also propose two challenge sets: one with longer texts including discourse structure, and one that addresses compositional generalization. We evaluate five neural models for semantic parsing and meaning-to-text generation. Our results show that model performance declines (in some cases dramatically) on the challenge sets, revealing the limitations of neural models when confronting such challenges.

PMB5: Gaining More Insight into Neural Semantic Parsing with Challenging Benchmarks

TL;DR

Five neural models for semantic parsing and meaning-to-text generation are evaluated and it is shown that model performance declines (in some cases dramatically) on the challenge sets, revealing the limitations of neural models when confronting such challenges.

Abstract

The Parallel Meaning Bank (PMB) serves as a corpus for semantic processing with a focus on semantic parsing and text generation. Currently, we witness an excellent performance of neural parsers and generators on the PMB. This might suggest that such semantic processing tasks have by and large been solved. We argue that this is not the case and that performance scores from the past on the PMB are inflated by non-optimal data splits and test sets that are too easy. In response, we introduce several changes. First, instead of the prior random split, we propose a more systematic splitting approach to improve the reliability of the standard test data. Second, except for the standard test set, we also propose two challenge sets: one with longer texts including discourse structure, and one that addresses compositional generalization. We evaluate five neural models for semantic parsing and meaning-to-text generation. Our results show that model performance declines (in some cases dramatically) on the challenge sets, revealing the limitations of neural models when confronting such challenges.
Paper Structure (23 sections, 2 equations, 4 figures, 10 tables, 3 algorithms)

This paper contains 23 sections, 2 equations, 4 figures, 10 tables, 3 algorithms.

Figures (4)

  • Figure 1: (a) An example sentence "Bill did not commit the crime." taken from the PMB in six languages with its DRS in (b) box notation, (c) clause notation, (d) sequence box notation, and (e) graph notation.
  • Figure 2: Two recombination operations performed on the CCG derivation tree of example sentence "I have a dog": (b) substitution (c) extension. We retained only the CCG categories and their corresponding words/phrases, excluding other semantic information.
  • Figure 3: Distribution of word overlap rates between train and test sets in EN, DE, NL, IT. Lower overlap rates signify fewer words occurring in both train and test sets.
  • Figure 4: Distribution of word overlap rates between train and development sets in EN, DE, NL, IT.