Table of Contents
Fetching ...

HiTZ at VarDial 2025 NorSID: Overcoming Data Scarcity with Language Transfer and Automatic Data Annotation

Jaione Bengoetxea, Mikel Zubillaga, Ekhi Azurmendi, Maite Heredia, Julen Etxaniz, Markel Ferro, Jeremy Barnes

TL;DR

This work tackles data scarcity in Norwegian dialect NLP by combining cross-lingual transfer with multitask learning for intent detection and slot filling, leveraging the xSID corpus and noMusic development data. It shows that a English-only multitask model can outperform multilingual variants, while dialect identification benefits from development-data-focused training and diverse, semi-automatic data sources. The study provides a comprehensive analysis of data sources (NorDial, NTS, NB Samtale, NDC) and methods (lexical mapping, encoder/decoder fine-tuning, few-shot and automatic data annotation), revealing domain specificity as a key factor in performance. Overall, the results demonstrate practical approaches to overcoming data scarcity in dialect-rich SLU, with implications for resource creation and model design for low-resource language variants.

Abstract

In this paper we present our submission for the NorSID Shared Task as part of the 2025 VarDial Workshop (Scherrer et al., 2025), consisting of three tasks: Intent Detection, Slot Filling and Dialect Identification, evaluated using data in different dialects of the Norwegian language. For Intent Detection and Slot Filling, we have fine-tuned a multitask model in a cross-lingual setting, to leverage the xSID dataset available in 17 languages. In the case of Dialect Identification, our final submission consists of a model fine-tuned on the provided development set, which has obtained the highest scores within our experiments. Our final results on the test set show that our models do not drop in performance compared to the development set, likely due to the domain-specificity of the dataset and the similar distribution of both subsets. Finally, we also report an in-depth analysis of the provided datasets and their artifacts, as well as other sets of experiments that have been carried out but did not yield the best results. Additionally, we present an analysis on the reasons why some methods have been more successful than others; mainly the impact of the combination of languages and domain-specificity of the training data on the results.

HiTZ at VarDial 2025 NorSID: Overcoming Data Scarcity with Language Transfer and Automatic Data Annotation

TL;DR

This work tackles data scarcity in Norwegian dialect NLP by combining cross-lingual transfer with multitask learning for intent detection and slot filling, leveraging the xSID corpus and noMusic development data. It shows that a English-only multitask model can outperform multilingual variants, while dialect identification benefits from development-data-focused training and diverse, semi-automatic data sources. The study provides a comprehensive analysis of data sources (NorDial, NTS, NB Samtale, NDC) and methods (lexical mapping, encoder/decoder fine-tuning, few-shot and automatic data annotation), revealing domain specificity as a key factor in performance. Overall, the results demonstrate practical approaches to overcoming data scarcity in dialect-rich SLU, with implications for resource creation and model design for low-resource language variants.

Abstract

In this paper we present our submission for the NorSID Shared Task as part of the 2025 VarDial Workshop (Scherrer et al., 2025), consisting of three tasks: Intent Detection, Slot Filling and Dialect Identification, evaluated using data in different dialects of the Norwegian language. For Intent Detection and Slot Filling, we have fine-tuned a multitask model in a cross-lingual setting, to leverage the xSID dataset available in 17 languages. In the case of Dialect Identification, our final submission consists of a model fine-tuned on the provided development set, which has obtained the highest scores within our experiments. Our final results on the test set show that our models do not drop in performance compared to the development set, likely due to the domain-specificity of the dataset and the similar distribution of both subsets. Finally, we also report an in-depth analysis of the provided datasets and their artifacts, as well as other sets of experiments that have been carried out but did not yield the best results. Additionally, we present an analysis on the reasons why some methods have been more successful than others; mainly the impact of the combination of languages and domain-specificity of the training data on the results.

Paper Structure

This paper contains 38 sections, 1 equation, 2 figures, 13 tables.

Figures (2)

  • Figure 1: The idea behind the multitask model fine-tuned for both intent detection and slot filling tasks at the same time.
  • Figure 2: Accuracy of pretrained English models (BERT, RoBERTa), multilingual models (XLM-RoBERTa) and a Norwegian pretrained models (NorBERT3) trained for Intent Detection on the Norwegian train set and evaluated the development set.