Table of Contents
Fetching ...

Language Modeling for the Future of Finance: A Survey into Metrics, Tasks, and Data Opportunities

Nikita Tatarinov, Siddhant Sukhani, Agam Shah, Sudheer Chava

TL;DR

Language Modeling for the Future of Finance surveys NLP research in finance across 2017–2024, revealing a shift toward foundation models while highlighting gaps in finance-specific evaluation, crisis robustness, multilingual data, and openness. By classifying work into four task categories and analyzing data sources, metrics, and accessibility, the study identifies concrete opportunities: expand forecasting tasks, adopt finance-focused metrics like $Sharpe Ratio$ and $Maximum Drawdown$, incorporate crisis-period data for stress testing, and develop richer multilingual and multimodal datasets. It also argues for balancing PLMs with interpretable and efficient alternatives suitable for regulation and latency constraints, and emphasizes reproducibility through open resources. Collectively, the findings offer a practical roadmap for researchers and practitioners to build more robust, transparent, and globally applicable NLP solutions in finance.

Abstract

Recent advances in language modeling have led to a growing number of papers related to finance in top-tier Natural Language Processing (NLP) venues. To systematically examine this trend, we review 374 NLP research papers published between 2017 and 2024 across 38 conferences and workshops, with a focused analysis of 221 papers that directly address finance-related tasks. We evaluate these papers across 11 quantitative and qualitative dimensions, and our study identifies the following opportunities for NLP researchers: (i) expanding the scope of forecasting tasks; (ii) enriching evaluation with financial metrics; (iii) leveraging multilingual and crisis-period datasets; and (iv) balancing PLMs with efficient or interpretable alternatives. We identify actionable directions supported by dataset and tool recommendations, with implications for both the academia and industry communities.

Language Modeling for the Future of Finance: A Survey into Metrics, Tasks, and Data Opportunities

TL;DR

Language Modeling for the Future of Finance surveys NLP research in finance across 2017–2024, revealing a shift toward foundation models while highlighting gaps in finance-specific evaluation, crisis robustness, multilingual data, and openness. By classifying work into four task categories and analyzing data sources, metrics, and accessibility, the study identifies concrete opportunities: expand forecasting tasks, adopt finance-focused metrics like and , incorporate crisis-period data for stress testing, and develop richer multilingual and multimodal datasets. It also argues for balancing PLMs with interpretable and efficient alternatives suitable for regulation and latency constraints, and emphasizes reproducibility through open resources. Collectively, the findings offer a practical roadmap for researchers and practitioners to build more robust, transparent, and globally applicable NLP solutions in finance.

Abstract

Recent advances in language modeling have led to a growing number of papers related to finance in top-tier Natural Language Processing (NLP) venues. To systematically examine this trend, we review 374 NLP research papers published between 2017 and 2024 across 38 conferences and workshops, with a focused analysis of 221 papers that directly address finance-related tasks. We evaluate these papers across 11 quantitative and qualitative dimensions, and our study identifies the following opportunities for NLP researchers: (i) expanding the scope of forecasting tasks; (ii) enriching evaluation with financial metrics; (iii) leveraging multilingual and crisis-period datasets; and (iv) balancing PLMs with efficient or interpretable alternatives. We identify actionable directions supported by dataset and tool recommendations, with implications for both the academia and industry communities.

Paper Structure

This paper contains 35 sections, 7 figures, 12 tables.

Figures (7)

  • Figure 1: Overview of our paper selection process and analysis dimensions. We collected papers from a broad range of NLP venues using abstract-level keyword filtering, yielding 374 candidates. After removing mismatches and shared task papers, we retained 221 papers, categorized into four groups by their connection to financial tasks.
  • Figure 2: Distribution of primary tasks across categories. Each cell shows the task name and paper count (e.g., "Sentiment Analysis (3)"), with color gradients indicating frequency -- darker shades represent more papers. "Miscellaneous" groups tasks that appear only once within Categories II and III.
  • Figure 3: Distribution of evaluation metrics used in Category I papers. Most rely on ML-based metrics, while only a few financial metrics appear repeatedly.
  • Figure 4: Data year distribution in financial forecasting papers, annotated with major financial events and infrastructure milestones. Highlights underuse of crisis periods despite their importance for model robustness.
  • Figure 5: Trends in code and dataset availability, highlighting the shift toward open-source practices and the growing accessibility of NLP resources for finance.
  • ...and 2 more figures