Generating Text from Uniform Meaning Representation
Emma Markle, Reihaneh Iranmanesh, Shira Wein
TL;DR
The paper addresses generating fluent text from Uniform Meaning Representation (UMR), a multilingual graph-based semantic representation. It proposes three strategies that leverage Abstract Meaning Representation (AMR) technologies: baseline AMR-to-text generation on UMR, a pipeline converting UMR to AMR before generation, and fine-tuning AMR-based and foundation models directly on UMR data. Empirical results show fine-tuned AMR-to-text approaches yield the strongest multilingual performance, with English achieving an mBERT score of 0.825 and Chinese 0.882 under certain configurations, and document-level information providing gains even for single-sentence outputs. The work highlights data limitations, especially for Indigenous languages, and demonstrates the value of AMR-informed fine-tuning for enabling an initial UMR-to-text ecosystem across languages.
Abstract
Uniform Meaning Representation (UMR) is a recently developed graph-based semantic representation, which expands on Abstract Meaning Representation (AMR) in a number of ways, in particular through the inclusion of document-level information and multilingual flexibility. In order to effectively adopt and leverage UMR for downstream tasks, efforts must be placed toward developing a UMR technological ecosystem. Though only a small amount of UMR annotations have been produced to date, in this work, we investigate the first approaches to producing text from multilingual UMR graphs. Exploiting the structural similarity between UMR and AMR graphs and the wide availability of AMR technologies, we introduce (1) a baseline approach which passes UMR graphs to AMR-to-text generation models, (2) a pipeline conversion of UMR to AMR, then using AMR-to-text generation models, and (3) a fine-tuning approach for both foundation models and AMR-to-text generation models with UMR data. Our best performing models achieve multilingual BERTscores of 0.825 for English and 0.882 for Chinese, a promising indication of the effectiveness of fine-tuning approaches for UMR-to-text generation even with limited UMR data.
