Language Modeling for the Future of Finance: A Survey into Metrics, Tasks, and Data Opportunities
Nikita Tatarinov, Siddhant Sukhani, Agam Shah, Sudheer Chava
TL;DR
Language Modeling for the Future of Finance surveys NLP research in finance across 2017–2024, revealing a shift toward foundation models while highlighting gaps in finance-specific evaluation, crisis robustness, multilingual data, and openness. By classifying work into four task categories and analyzing data sources, metrics, and accessibility, the study identifies concrete opportunities: expand forecasting tasks, adopt finance-focused metrics like $Sharpe Ratio$ and $Maximum Drawdown$, incorporate crisis-period data for stress testing, and develop richer multilingual and multimodal datasets. It also argues for balancing PLMs with interpretable and efficient alternatives suitable for regulation and latency constraints, and emphasizes reproducibility through open resources. Collectively, the findings offer a practical roadmap for researchers and practitioners to build more robust, transparent, and globally applicable NLP solutions in finance.
Abstract
Recent advances in language modeling have led to a growing number of papers related to finance in top-tier Natural Language Processing (NLP) venues. To systematically examine this trend, we review 374 NLP research papers published between 2017 and 2024 across 38 conferences and workshops, with a focused analysis of 221 papers that directly address finance-related tasks. We evaluate these papers across 11 quantitative and qualitative dimensions, and our study identifies the following opportunities for NLP researchers: (i) expanding the scope of forecasting tasks; (ii) enriching evaluation with financial metrics; (iii) leveraging multilingual and crisis-period datasets; and (iv) balancing PLMs with efficient or interpretable alternatives. We identify actionable directions supported by dataset and tool recommendations, with implications for both the academia and industry communities.
