Table of Contents
Fetching ...

MACT: Model-Agnostic Cross-Lingual Training for Discourse Representation Structure Parsing

Jiangming Liu

TL;DR

This work tackles cross-lingual discourse representation structure (DRS) parsing, addressing the limitation of monolingual training by introducing a model-agnostic cross-lingual training approach that leverages multilingual data via pre-trained language models. It frames DRS parsing as conditional generation and employs parameter-efficient fine-tuning with LoRA on a large multilingual model (mT0-large), achieving state-of-the-art results on PMB 3.0.0 and 4.0.0 across English, German, Italian, and Dutch for both clause and graph forms, while offering detailed analyses of well-formedness and error patterns. The method enables effective cross-language generalization without language-identification in training data and demonstrates the practicality of combining cross-lingual training with language-specific fine-tuning to maximize performance, scalability, and data efficiency. This approach holds promise for multilingual semantic parsing tasks beyond DRS, particularly where data are unevenly available across languages and model efficiency is essential.

Abstract

Discourse Representation Structure (DRS) is an innovative semantic representation designed to capture the meaning of texts with arbitrary lengths across languages. The semantic representation parsing is essential for achieving natural language understanding through logical forms. Nevertheless, the performance of DRS parsing models remains constrained when trained exclusively on monolingual data. To tackle this issue, we introduce a cross-lingual training strategy. The proposed method is model-agnostic yet highly effective. It leverages cross-lingual training data and fully exploits the alignments between languages encoded in pre-trained language models. The experiments conducted on the standard benchmarks demonstrate that models trained using the cross-lingual training method exhibit significant improvements in DRS clause and graph parsing in English, German, Italian and Dutch. Comparing our final models to previous works, we achieve state-of-the-art results in the standard benchmarks. Furthermore, the detailed analysis provides deep insights into the performance of the parsers, offering inspiration for future research in DRS parsing. We keep updating new results on benchmarks to the appendix.

MACT: Model-Agnostic Cross-Lingual Training for Discourse Representation Structure Parsing

TL;DR

This work tackles cross-lingual discourse representation structure (DRS) parsing, addressing the limitation of monolingual training by introducing a model-agnostic cross-lingual training approach that leverages multilingual data via pre-trained language models. It frames DRS parsing as conditional generation and employs parameter-efficient fine-tuning with LoRA on a large multilingual model (mT0-large), achieving state-of-the-art results on PMB 3.0.0 and 4.0.0 across English, German, Italian, and Dutch for both clause and graph forms, while offering detailed analyses of well-formedness and error patterns. The method enables effective cross-language generalization without language-identification in training data and demonstrates the practicality of combining cross-lingual training with language-specific fine-tuning to maximize performance, scalability, and data efficiency. This approach holds promise for multilingual semantic parsing tasks beyond DRS, particularly where data are unevenly available across languages and model efficiency is essential.

Abstract

Discourse Representation Structure (DRS) is an innovative semantic representation designed to capture the meaning of texts with arbitrary lengths across languages. The semantic representation parsing is essential for achieving natural language understanding through logical forms. Nevertheless, the performance of DRS parsing models remains constrained when trained exclusively on monolingual data. To tackle this issue, we introduce a cross-lingual training strategy. The proposed method is model-agnostic yet highly effective. It leverages cross-lingual training data and fully exploits the alignments between languages encoded in pre-trained language models. The experiments conducted on the standard benchmarks demonstrate that models trained using the cross-lingual training method exhibit significant improvements in DRS clause and graph parsing in English, German, Italian and Dutch. Comparing our final models to previous works, we achieve state-of-the-art results in the standard benchmarks. Furthermore, the detailed analysis provides deep insights into the performance of the parsers, offering inspiration for future research in DRS parsing. We keep updating new results on benchmarks to the appendix.
Paper Structure (38 sections, 2 equations, 4 figures, 5 tables)

This paper contains 38 sections, 2 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Training of an Italian semantic parser. (a) Monolingual training with multilingual data using machine translation systems. (b) cross-lingual training without language identifications.
  • Figure 2: DRSs of the English sentence "Tom climbed up the telephone pole" in (a) box form, (b) clause form, and (c) graph form.
  • Figure 3: Examples of DRSs in sequential graph form. The red arrow lines indicate the mapping from nodes to corresponding items, and the green arrow lines indicate the argument positions.
  • Figure 4: The test results given by Cross-lingual+ for English DRS parsing in PMB 4.0.0 with input sentences of varying lengths.