A Family of Pretrained Transformer Language Models for Russian
Dmitry Zmitrovich, Alexander Abramov, Andrey Kalmykov, Maria Tikhonova, Ekaterina Taktasheva, Danil Astafurov, Mark Baushenko, Artem Snegirev, Vitalii Kadulin, Sergey Markov, Tatiana Shavrina, Vladislav Mikhailov, Alena Fenogenova
TL;DR
This work addresses the scarcity of monolingual Transformer LMs for Russian by introducing a family of 13 pretrained models across encoder, decoder, and encoder-decoder architectures. It documents extensive pretraining on a diverse Russian corpus and provides detailed architecture and training configurations for ruBERT, ruRoBERTa, ruELECTRA, ruGPT-3, ruT5, and FRED-T5 variants. Comprehensive evaluation across language understanding and generation tasks—including Russian SuperGLUE, RuCoLA, and detoxification—demonstrates state-of-the-art performance for several models, particularly FRED-T5-XL, and highlights strong cross-task generalization relative to multilingual baselines. The models are released publicly under MIT license, enabling researchers and industry to advance Russian NLP applications while the paper discusses limitations and ethical considerations for responsible deployment and future improvements.
Abstract
Transformer language models (LMs) are fundamental to NLP research methodologies and applications in various languages. However, developing such models specifically for the Russian language has received little attention. This paper introduces a collection of 13 Russian Transformer LMs, which spans encoder (ruBERT, ruRoBERTa, ruELECTRA), decoder (ruGPT-3), and encoder-decoder (ruT5, FRED-T5) architectures. We provide a report on the model architecture design and pretraining, and the results of evaluating their generalization abilities on Russian language understanding and generation datasets and benchmarks. By pretraining and releasing these specialized Transformer LMs, we aim to broaden the scope of the NLP research directions and enable the development of industrial solutions for the Russian language.
