Table of Contents
Fetching ...

Modular Adaptation of Multilingual Encoders to Written Swiss German Dialect

Jannis Vamvas, Noëmi Aepli, Rico Sennrich

TL;DR

This work addresses the scarcity of written Swiss German data by adapting existing multilingual encoders through continued pre-training. It systematically compares monolithic adaptation (full-model updates) against modular approaches, including a subword-level Swiss German adapter and a novel Canine-style character adapter built on SwissBERT. Key findings show modular adaptation can approach, and in some cases rival, monolithic performance (e.g., 97.5% of full-model accuracy when only adapters are updated) and that character-level strategies particularly excel at retrieval tasks, with Canine-based variants achieving state-of-the-art results. The study provides practical, parameter-efficient strategies for dialectal variation and releases four trained Swiss German encoders and code to the research community.

Abstract

Creating neural text encoders for written Swiss German is challenging due to a dearth of training data combined with dialectal variation. In this paper, we build on several existing multilingual encoders and adapt them to Swiss German using continued pre-training. Evaluation on three diverse downstream tasks shows that simply adding a Swiss German adapter to a modular encoder achieves 97.5% of fully monolithic adaptation performance. We further find that for the task of retrieving Swiss German sentences given Standard German queries, adapting a character-level model is more effective than the other adaptation strategies. We release our code and the models trained for our experiments at https://github.com/ZurichNLP/swiss-german-text-encoders

Modular Adaptation of Multilingual Encoders to Written Swiss German Dialect

TL;DR

This work addresses the scarcity of written Swiss German data by adapting existing multilingual encoders through continued pre-training. It systematically compares monolithic adaptation (full-model updates) against modular approaches, including a subword-level Swiss German adapter and a novel Canine-style character adapter built on SwissBERT. Key findings show modular adaptation can approach, and in some cases rival, monolithic performance (e.g., 97.5% of full-model accuracy when only adapters are updated) and that character-level strategies particularly excel at retrieval tasks, with Canine-based variants achieving state-of-the-art results. The study provides practical, parameter-efficient strategies for dialectal variation and releases four trained Swiss German encoders and code to the research community.

Abstract

Creating neural text encoders for written Swiss German is challenging due to a dearth of training data combined with dialectal variation. In this paper, we build on several existing multilingual encoders and adapt them to Swiss German using continued pre-training. Evaluation on three diverse downstream tasks shows that simply adding a Swiss German adapter to a modular encoder achieves 97.5% of fully monolithic adaptation performance. We further find that for the task of retrieving Swiss German sentences given Standard German queries, adapting a character-level model is more effective than the other adaptation strategies. We release our code and the models trained for our experiments at https://github.com/ZurichNLP/swiss-german-text-encoders
Paper Structure (25 sections, 10 tables)