Improving Dialectal Slot and Intent Detection with Auxiliary Tasks: A Multi-Dialectal Bavarian Case Study
Xaver Maria Krückl, Verena Blaschke, Barbara Plank
TL;DR
This study tackles the challenge of slot and intent detection (SID) for Bavarian dialects under zero-shot transfer by fine-tuning encoder-only PLMs on English SID data and evaluating on Bavarian test sets, including a newly released Munich Bavarian dataset. It systematically compares baseline, multi-task learning (MTL), and intermediate-task training (ITT) setups, using three Bavarian auxiliary tasks—syntactic dependencies/POS (UD), NER (BarNER), and masked language modeling (MLM)—to analyze cross-dialect transfer. The findings show that auxiliary tasks predominantly improve slot filling, with NER providing the strongest gains, and that ITT yields more consistent improvements than MTL, achieving up to +5.1pp in intent accuracy and +8.4pp in slot F1 on Bavarian data (best model: MLM×NER→SID). Across Bavarian variants and additional dialect data (Swiss German, Standard German, English), the results suggest robust transfer patterns with some dialect-specific differences, and reveal the value of the new Munich dataset for evaluating intra-dialect variation. The work contributes practical guidance for dialectal SID via auxiliary tasks and ITT, releases valuable data, and provides open-source tooling for cross-dialect NLU research in digital assistants.
Abstract
Reliable slot and intent detection (SID) is crucial in natural language understanding for applications like digital assistants. Encoder-only transformer models fine-tuned on high-resource languages generally perform well on SID. However, they struggle with dialectal data, where no standardized form exists and training data is scarce and costly to produce. We explore zero-shot transfer learning for SID, focusing on multiple Bavarian dialects, for which we release a new dataset for the Munich dialect. We evaluate models trained on auxiliary tasks in Bavarian, and compare joint multi-task learning with intermediate-task training. We also compare three types of auxiliary tasks: token-level syntactic tasks, named entity recognition (NER), and language modelling. We find that the included auxiliary tasks have a more positive effect on slot filling than intent classification (with NER having the most positive effect), and that intermediate-task training yields more consistent performance gains. Our best-performing approach improves intent classification performance on Bavarian dialects by 5.1 and slot filling F1 by 8.4 percentage points.
