jina-embeddings-v5-text: Task-Targeted Embedding Distillation

Mohammad Kalim Akram; Saba Sturua; Nastia Havriushenko; Quentin Herreros; Michael Günther; Maximilian Werk; Han Xiao

jina-embeddings-v5-text: Task-Targeted Embedding Distillation

Mohammad Kalim Akram, Saba Sturua, Nastia Havriushenko, Quentin Herreros, Michael Günther, Maximilian Werk, Han Xiao

TL;DR

This paper tackles efficient multilingual text embeddings for information retrieval by marrying distillation with task-specific training. It introduces jina-embeddings-v5-text-small and -nano, created via a two-stage process: embedding distillation from a large teacher, followed by training multiple LoRA adapters for retrieval, semantic similarity, clustering, and classification. Ablation studies demonstrate that combining embedding-level distillation with task-specific objectives yields superior retrieval performance, robustness to truncation, and quantization stability, outperforming other small models on MTEB/MMTEB benchmarks. The work also enables long-context support (up to 32k tokens) and publicly releases weights and integration tooling to advance embedding research and practical IR deployments.

Abstract

Text embedding models are widely used for semantic similarity tasks, including information retrieval, clustering, and classification. General-purpose models are typically trained with single- or multi-stage processes using contrastive loss functions. We introduce a novel training regimen that combines model distillation techniques with task-specific contrastive loss to produce compact, high-performance embedding models. Our findings suggest that this approach is more effective for training small models than purely contrastive or distillation-based training paradigms alone. Benchmark scores for the resulting models, jina-embeddings-v5-text-small and jina-embeddings-v5-text-nano, exceed or match the state-of-the-art for models of similar size. jina-embeddings-v5-text models additionally support long texts (up to 32k tokens) in many languages, and generate embeddings that remain robust under truncation and binary quantization. Model weights are publicly available, hopefully inspiring further advances in embedding model development.

jina-embeddings-v5-text: Task-Targeted Embedding Distillation

TL;DR

Abstract

Paper Structure (44 sections, 12 equations, 7 figures, 25 tables)

This paper contains 44 sections, 12 equations, 7 figures, 25 tables.

Introduction
Related Work
Language Model Distillation
Embedding Model Distillation
Task-Specific Embedding Training
Model Architecture
Training
Embedding Distillation:
Task-Specific Adapter Training:
First-Stage: Embedding Distillation
Positional Information
Loss Function
Training Procedure
General-Purpose Training:
Long Context Training:
...and 29 more sections

Figures (7)

Figure 1: Architecture of jina-embeddings-v5-text.
Figure 2: Performance of j-v5-text-small on different languages on MMTEB compared to other models
Figure 3: Performance comparison of different training objectives. Average nDCG@10 on the MTEB (English, v2) benchmark for S2ORC (left) and the full training data mixture (right).
Figure 4: Comparison of projection configurations on S2ORC. Performance is measured by average nDCG@10 on MTEB (English, v2).
Figure 5: Average MMTEB score across reduced embedding dimensions.
...and 2 more figures

jina-embeddings-v5-text: Task-Targeted Embedding Distillation

TL;DR

Abstract

jina-embeddings-v5-text: Task-Targeted Embedding Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)