Table of Contents
Fetching ...

SwissBERT: The Multilingual Language Model for Switzerland

Jannis Vamvas, Johannes Graën, Rico Sennrich

TL;DR

SwissBERT targets Switzerland's unique multilingual setting by extending the Cross-lingual Modular Transformer (X-MOD) with language adapters for German, French, Italian, and Romansh, and by domain-adapting to a Swiss news corpus (~$12$ billion tokens). It investigates two vocabulary strategies (reusing XLM-R vocabulary vs a new 50k vocabulary) and demonstrates that adapter-based multilingual models can achieve strong cross-lingual transfer, especially for Romansh, across tasks such as NER, stance detection, sentence retrieval, and word alignment. The study provides detailed evaluation against general-purpose and specialized baselines, showing SwissBERT's advantages in Switzerland-related NLP and its robustness to domain shifts, while also outlining limitations and future work on extending dialects. The authors release the model and code for non-commercial use, enabling researchers to build and adapt Swiss-language NLP tools for academic and public-interest applications.

Abstract

We present SwissBERT, a masked language model created specifically for processing Switzerland-related text. SwissBERT is a pre-trained model that we adapted to news articles written in the national languages of Switzerland -- German, French, Italian, and Romansh. We evaluate SwissBERT on natural language understanding tasks related to Switzerland and find that it tends to outperform previous models on these tasks, especially when processing contemporary news and/or Romansh Grischun. Since SwissBERT uses language adapters, it may be extended to Swiss German dialects in future work. The model and our open-source code are publicly released at https://github.com/ZurichNLP/swissbert.

SwissBERT: The Multilingual Language Model for Switzerland

TL;DR

SwissBERT targets Switzerland's unique multilingual setting by extending the Cross-lingual Modular Transformer (X-MOD) with language adapters for German, French, Italian, and Romansh, and by domain-adapting to a Swiss news corpus (~ billion tokens). It investigates two vocabulary strategies (reusing XLM-R vocabulary vs a new 50k vocabulary) and demonstrates that adapter-based multilingual models can achieve strong cross-lingual transfer, especially for Romansh, across tasks such as NER, stance detection, sentence retrieval, and word alignment. The study provides detailed evaluation against general-purpose and specialized baselines, showing SwissBERT's advantages in Switzerland-related NLP and its robustness to domain shifts, while also outlining limitations and future work on extending dialects. The authors release the model and code for non-commercial use, enabling researchers to build and adapt Swiss-language NLP tools for academic and public-interest applications.

Abstract

We present SwissBERT, a masked language model created specifically for processing Switzerland-related text. SwissBERT is a pre-trained model that we adapted to news articles written in the national languages of Switzerland -- German, French, Italian, and Romansh. We evaluate SwissBERT on natural language understanding tasks related to Switzerland and find that it tends to outperform previous models on these tasks, especially when processing contemporary news and/or Romansh Grischun. Since SwissBERT uses language adapters, it may be extended to Swiss German dialects in future work. The model and our open-source code are publicly released at https://github.com/ZurichNLP/swissbert.
Paper Structure (45 sections, 3 figures, 12 tables)

This paper contains 45 sections, 3 figures, 12 tables.

Figures (3)

  • Figure 1: SwissBERT is a transformer encoder with language adapters pfeiffer-etal-2022-lifting in each layer. There is an adapter for each national language of Switzerland. The other parameters in the model are shared among the four languages.
  • Figure 2: We train two variants of SwissBERT: Variant 1 reuses the vocabulary and embeddings of the pre-trained model, and only language adapters are trained. Variant 2 uses a custom SwissBERT vocabulary based on our pre-training corpus, and multilingual embeddings are trained in addition to the adapters.
  • Figure 3: Number of tokens (in terms of XLM-R vocabulary) per year in the training set.