Table of Contents
Fetching ...

FinMTEB: Finance Massive Text Embedding Benchmark

Yixuan Tang, Yi Yang

TL;DR

This work introduces FinMTEB, the first comprehensive, finance-domain embedding benchmark featuring 64 datasets across seven tasks in English and Chinese, to address the inadequacy of general benchmarks for financial text. It also develops Fin-E5, a finance-adapted embedding model trained via persona-based data augmentation and contrastive learning, achieving state-of-the-art performance on FinMTEB. Key findings show that domain-adapted, LLM-based embeddings outperform general-purpose counterparts, general benchmarks poorly predict finance-task performance, and simple BoW methods can surpass dense embeddings on financial STS, underscoring gaps in current embedding techniques. Together, FinMTEB and Fin-E5 provide a robust framework and practical resources for advancing domain-specific financial NLP tools and applications.

Abstract

Embedding models play a crucial role in representing and retrieving information across various NLP applications. Recent advances in large language models (LLMs) have further enhanced the performance of embedding models. While these models are often benchmarked on general-purpose datasets, real-world applications demand domain-specific evaluation. In this work, we introduce the Finance Massive Text Embedding Benchmark (FinMTEB), a specialized counterpart to MTEB designed for the financial domain. FinMTEB comprises 64 financial domain-specific embedding datasets across 7 tasks that cover diverse textual types in both Chinese and English, such as financial news articles, corporate annual reports, ESG reports, regulatory filings, and earnings call transcripts. We also develop a finance-adapted model, Fin-E5, using a persona-based data synthetic method to cover diverse financial embedding tasks for training. Through extensive evaluation of 15 embedding models, including Fin-E5, we show three key findings: (1) performance on general-purpose benchmarks shows limited correlation with financial domain tasks; (2) domain-adapted models consistently outperform their general-purpose counterparts; and (3) surprisingly, a simple Bag-of-Words (BoW) approach outperforms sophisticated dense embeddings in financial Semantic Textual Similarity (STS) tasks, underscoring current limitations in dense embedding techniques. Our work establishes a robust evaluation framework for financial NLP applications and provides crucial insights for developing domain-specific embedding models.

FinMTEB: Finance Massive Text Embedding Benchmark

TL;DR

This work introduces FinMTEB, the first comprehensive, finance-domain embedding benchmark featuring 64 datasets across seven tasks in English and Chinese, to address the inadequacy of general benchmarks for financial text. It also develops Fin-E5, a finance-adapted embedding model trained via persona-based data augmentation and contrastive learning, achieving state-of-the-art performance on FinMTEB. Key findings show that domain-adapted, LLM-based embeddings outperform general-purpose counterparts, general benchmarks poorly predict finance-task performance, and simple BoW methods can surpass dense embeddings on financial STS, underscoring gaps in current embedding techniques. Together, FinMTEB and Fin-E5 provide a robust framework and practical resources for advancing domain-specific financial NLP tools and applications.

Abstract

Embedding models play a crucial role in representing and retrieving information across various NLP applications. Recent advances in large language models (LLMs) have further enhanced the performance of embedding models. While these models are often benchmarked on general-purpose datasets, real-world applications demand domain-specific evaluation. In this work, we introduce the Finance Massive Text Embedding Benchmark (FinMTEB), a specialized counterpart to MTEB designed for the financial domain. FinMTEB comprises 64 financial domain-specific embedding datasets across 7 tasks that cover diverse textual types in both Chinese and English, such as financial news articles, corporate annual reports, ESG reports, regulatory filings, and earnings call transcripts. We also develop a finance-adapted model, Fin-E5, using a persona-based data synthetic method to cover diverse financial embedding tasks for training. Through extensive evaluation of 15 embedding models, including Fin-E5, we show three key findings: (1) performance on general-purpose benchmarks shows limited correlation with financial domain tasks; (2) domain-adapted models consistently outperform their general-purpose counterparts; and (3) surprisingly, a simple Bag-of-Words (BoW) approach outperforms sophisticated dense embeddings in financial Semantic Textual Similarity (STS) tasks, underscoring current limitations in dense embedding techniques. Our work establishes a robust evaluation framework for financial NLP applications and provides crucial insights for developing domain-specific embedding models.

Paper Structure

This paper contains 30 sections, 1 equation, 2 figures, 14 tables.

Figures (2)

  • Figure 1: An overview of tasks and datasets used in FinMTEB. All the dataset descriptions and examples are provided in the Appendix \ref{['append: datasets']}.
  • Figure 2: Semantic similarity across all the datasets in FinMTEB benchmark.