Cross-lingual Back-Parsing: Utterance Synthesis from Meaning Representation for Zero-Resource Semantic Parsing
Deokhyung Kang, Seonjeong Hwang, Yunsu Kim, Gary Geunbae Lee
TL;DR
This work tackles zero-resource cross-lingual semantic parsing by introducing Cross-lingual Back-Parsing (CBP), a data-augmentation framework that synthesizes target-language utterances from source meaning representations using a multilingual seq2seq backbone with language adapters. A novel source-switched denoising objective trains adapters to control output language, enabling generation of $u_{tgt}$ from $mr_{src}$, followed by a round-trip consistency filter to ensure semantic fidelity. Empirical results on Mschema2QA and Xspider show substantial improvements in target-language exact-match and high slot-value alignment, even with only monolingual data and no target-language labels, outperforming several MT-based baselines and competitive LLM prompts. The findings demonstrate practical, scalable zero-resource cross-lingual SP and suggest CBP's applicability to other cross-lingual generation tasks.
Abstract
Recent efforts have aimed to utilize multilingual pretrained language models (mPLMs) to extend semantic parsing (SP) across multiple languages without requiring extensive annotations. However, achieving zero-shot cross-lingual transfer for SP remains challenging, leading to a performance gap between source and target languages. In this study, we propose Cross-Lingual Back-Parsing (CBP), a novel data augmentation methodology designed to enhance cross-lingual transfer for SP. Leveraging the representation geometry of the mPLMs, CBP synthesizes target language utterances from source meaning representations. Our methodology effectively performs cross-lingual data augmentation in challenging zero-resource settings, by utilizing only labeled data in the source language and monolingual corpora. Extensive experiments on two cross-language SP benchmarks (Mschema2QA and Xspider) demonstrate that CBP brings substantial gains in the target language. Further analysis of the synthesized utterances shows that our method successfully generates target language utterances with high slot value alignment rates while preserving semantic integrity. Our codes and data are publicly available at https://github.com/deokhk/CBP.
