GPT-SW3: An Autoregressive Language Model for the Nordic Languages

Ariel Ekgren; Amaru Cuba Gyllensten; Felix Stollenwerk; Joey Öhman; Tim Isbister; Evangelia Gogoulou; Fredrik Carlsson; Alice Heiman; Judit Casademont; Magnus Sahlgren

GPT-SW3: An Autoregressive Language Model for the Nordic Languages

Ariel Ekgren, Amaru Cuba Gyllensten, Felix Stollenwerk, Joey Öhman, Tim Isbister, Evangelia Gogoulou, Fredrik Carlsson, Alice Heiman, Judit Casademont, Magnus Sahlgren

Abstract

This paper details the process of developing the first native large generative language model for the Nordic languages, GPT-SW3. We cover all parts of the development process, from data collection and processing, training configuration and instruction finetuning, to evaluation and considerations for release strategies. We hope that this paper can serve as a guide and reference for other researchers that undertake the development of large generative models for smaller languages.

GPT-SW3: An Autoregressive Language Model for the Nordic Languages

Abstract

Paper Structure (13 sections, 4 equations, 3 figures, 10 tables)

This paper contains 13 sections, 4 equations, 3 figures, 10 tables.

Introduction
Related Work
Data
Tokenizer
Training
Energy consumption
Instruction finetuning
Evaluation
Release plan
Discussion
Bibliographical References
Data Weighting
Scaling Analysis

Figures (3)

Figure 1: Normalized learning rate schedule. The maxima of the learning rate are given in Table \ref{['tab:training_runs']}.
Figure 2: Validation loss during training.
Figure 3: Scaling behaviour of GPT-SW3. The validation loss is shown as a function of the model size, while the dataset size is kept constant at 320B tokens for all models. The 20B parameter model (empty circle) is excluded from the fit (dashed curve). The gray, solid curve represents the scaling law from hoffmann2022an.

GPT-SW3: An Autoregressive Language Model for the Nordic Languages

Abstract

GPT-SW3: An Autoregressive Language Model for the Nordic Languages

Authors

Abstract

Table of Contents

Figures (3)