Table of Contents
Fetching ...

The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding

Kenneth Enevoldsen, Márton Kardos, Niklas Muennighoff, Kristoffer Laigaard Nielbo

TL;DR

The Scandinavian Embedding Benchmark (SEB) tackles the evaluation gap for text embeddings in Mainland Scandinavian languages by providing a comprehensive, open-source framework that spans 24 tasks across multiple domains and four task categories. It benchmarks over 26 models, including public and commercial APIs, and demonstrates significant performance disparities—especially in retrieval—between available public models and commercial solutions. By integrating SEB with MTEB and offering a model registry and dashboard, the work enables reproducible, cross-lingual benchmarking and drives progress toward robust Scandinavian and multilingual embeddings with broad practical impact for public institutions and industry. The findings highlight critical areas for improvement and establish a scalable path for expanding the benchmark to richer domains and tasks.

Abstract

The evaluation of English text embeddings has transitioned from evaluating a handful of datasets to broad coverage across many tasks through benchmarks such as MTEB. However, this is not the case for multilingual text embeddings due to a lack of available benchmarks. To address this problem, we introduce the Scandinavian Embedding Benchmark (SEB). SEB is a comprehensive framework that enables text embedding evaluation for Scandinavian languages across 24 tasks, 10 subtasks, and 4 task categories. Building on SEB, we evaluate more than 26 models, uncovering significant performance disparities between public and commercial solutions not previously captured by MTEB. We open-source SEB and integrate it with MTEB, thus bridging the text embedding evaluation gap for Scandinavian languages.

The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding

TL;DR

The Scandinavian Embedding Benchmark (SEB) tackles the evaluation gap for text embeddings in Mainland Scandinavian languages by providing a comprehensive, open-source framework that spans 24 tasks across multiple domains and four task categories. It benchmarks over 26 models, including public and commercial APIs, and demonstrates significant performance disparities—especially in retrieval—between available public models and commercial solutions. By integrating SEB with MTEB and offering a model registry and dashboard, the work enables reproducible, cross-lingual benchmarking and drives progress toward robust Scandinavian and multilingual embeddings with broad practical impact for public institutions and industry. The findings highlight critical areas for improvement and establish a scalable path for expanding the benchmark to richer domains and tasks.

Abstract

The evaluation of English text embeddings has transitioned from evaluating a handful of datasets to broad coverage across many tasks through benchmarks such as MTEB. However, this is not the case for multilingual text embeddings due to a lack of available benchmarks. To address this problem, we introduce the Scandinavian Embedding Benchmark (SEB). SEB is a comprehensive framework that enables text embedding evaluation for Scandinavian languages across 24 tasks, 10 subtasks, and 4 task categories. Building on SEB, we evaluate more than 26 models, uncovering significant performance disparities between public and commercial solutions not previously captured by MTEB. We open-source SEB and integrate it with MTEB, thus bridging the text embedding evaluation gap for Scandinavian languages.
Paper Structure (23 sections, 4 figures, 4 tables)

This paper contains 23 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: An overview of the tasks and datasets in . Flags denote the languages of the datasets.
  • Figure 2: Dataset similarity between the datasets included within . Embeddings are obtained by applying the embed-multilingual-v3.0 on 100 randomly sampled documents. Similarity is computed using cosine similarity.
  • Figure 3: Performance and speed of embeddings models. The size of the circles denotes the embedding size, and the color denotes the model type. Note that commercial APIs are not included. WPS stands for words per second. We avoid highlighting all models to improve readability.
  • Figure 4: The embeddings of 100 randomly sampled documents from each task, embedded using embed-multilingual-v3.0 and projected using a UMAP projection. The project uses the cosine metrics but otherwise default parameter values.