MELA: Multilingual Evaluation of Linguistic Acceptability
Ziyin Zhang, Yikang Liu, Weifang Huang, Junyu Mao, Rui Wang, Hai Hu
TL;DR
MELA introduces the first large-scale multilingual benchmark for linguistic acceptability, with 46k labeled sentences across 10 languages, enabling cross-lingual analysis and syntax probing. The authors benchmark a range of LLMs and XLM-R, finding that GPT-4o shows superior multilingual performance, while in-language prompting significantly boosts few-shot results; cross-lingual transfer remains non-trivial and data-size effects are nuanced. They also demonstrate that fine-tuning XLM-R on MELA enhances syntax-related representations via edge probing, suggesting that acceptability training fosters syntactic knowledge. The dataset fills a gap in multilingual linguistic evaluation and provides a resource for further cross-lingual, syntactic, and interpretability research, with data available at https://github.com/sjtu-compling/MELA.
Abstract
In this work, we present the largest benchmark to date on linguistic acceptability: Multilingual Evaluation of Linguistic Acceptability -- MELA, with 46K samples covering 10 languages from a diverse set of language families. We establish LLM baselines on this benchmark, and investigate cross-lingual transfer in acceptability judgements with XLM-R. In pursuit of multilingual interpretability, we conduct probing experiments with fine-tuned XLM-R to explore the process of syntax capability acquisition. Our results show that GPT-4o exhibits a strong multilingual ability, outperforming fine-tuned XLM-R, while open-source multilingual models lag behind by a noticeable gap. Cross-lingual transfer experiments show that transfer in acceptability judgment is non-trivial: 500 Icelandic fine-tuning examples lead to 23 MCC performance in a completely unrelated language -- Chinese. Results of our probing experiments indicate that training on MELA improves the performance of XLM-R on syntax-related tasks. Our data is available at https://github.com/sjtu-compling/MELA.
