Table of Contents
Fetching ...

HuAMR: A Hungarian AMR Parser and Dataset

Botond Barta, Endre Hamerlik, Milán Konor Nyist, Judit Ács

TL;DR

HuAMR introduces the first Hungarian AMR resource by translating AMR $3.0$ to Hungarian and creating HuAMR; the study trains and evaluates Hungarian AMR parsers using mT5 Large and Llama-3.2-1B, with silver-data augmentation from HuAMR and Europarl. Results show mT5 Large consistently outperforms Llama-3.2-1B, with silver-data gains saturating on the $AMR^{trans}$ test set; gold AMR$^{trans}$ data provide a modest ~5% Smatch $F_1$ boost, underscoring the importance of high-quality data and sufficient model capacity for cross-lingual semantic parsing. Overall, the work delivers a reusable Hungarian AMR resource and demonstrates effective strategies for cross-lingual AMR in low-resource settings.

Abstract

We present HuAMR, the first Abstract Meaning Representation (AMR) dataset and a suite of large language model-based AMR parsers for Hungarian, targeting the scarcity of semantic resources for non-English languages. To create HuAMR, we employed Llama-3.1-70B to automatically generate silver-standard AMR annotations, which we then refined manually to ensure quality. Building on this dataset, we investigate how different model architectures - mT5 Large and Llama-3.2-1B - and fine-tuning strategies affect AMR parsing performance. While incorporating silver-standard AMRs from Llama-3.1-70B into the training data of smaller models does not consistently boost overall scores, our results show that these techniques effectively enhance parsing accuracy on Hungarian news data (the domain of HuAMR). We evaluate our parsers using Smatch scores and confirm the potential of HuAMR and our parsers for advancing semantic parsing research.

HuAMR: A Hungarian AMR Parser and Dataset

TL;DR

HuAMR introduces the first Hungarian AMR resource by translating AMR to Hungarian and creating HuAMR; the study trains and evaluates Hungarian AMR parsers using mT5 Large and Llama-3.2-1B, with silver-data augmentation from HuAMR and Europarl. Results show mT5 Large consistently outperforms Llama-3.2-1B, with silver-data gains saturating on the test set; gold AMR data provide a modest ~5% Smatch boost, underscoring the importance of high-quality data and sufficient model capacity for cross-lingual semantic parsing. Overall, the work delivers a reusable Hungarian AMR resource and demonstrates effective strategies for cross-lingual AMR in low-resource settings.

Abstract

We present HuAMR, the first Abstract Meaning Representation (AMR) dataset and a suite of large language model-based AMR parsers for Hungarian, targeting the scarcity of semantic resources for non-English languages. To create HuAMR, we employed Llama-3.1-70B to automatically generate silver-standard AMR annotations, which we then refined manually to ensure quality. Building on this dataset, we investigate how different model architectures - mT5 Large and Llama-3.2-1B - and fine-tuning strategies affect AMR parsing performance. While incorporating silver-standard AMRs from Llama-3.1-70B into the training data of smaller models does not consistently boost overall scores, our results show that these techniques effectively enhance parsing accuracy on Hungarian news data (the domain of HuAMR). We evaluate our parsers using Smatch scores and confirm the potential of HuAMR and our parsers for advancing semantic parsing research.

Paper Structure

This paper contains 13 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: AMR notations for the sentence "The Hungarian boy wants to go".
  • Figure 2: Smatch F$_1$ score as a function of the training data size.