Open FinLLM Leaderboard: Towards Financial AI Readiness

Shengyuan Colin Lin; Felix Tian; Keyi Wang; Xingjian Zhao; Jimin Huang; Qianqian Xie; Luca Borella; Matt White; Christina Dan Wang; Kairong Xiao; Xiao-Yang Liu Yanglet; Li Deng

Open FinLLM Leaderboard: Towards Financial AI Readiness

Shengyuan Colin Lin, Felix Tian, Keyi Wang, Xingjian Zhao, Jimin Huang, Qianqian Xie, Luca Borella, Matt White, Christina Dan Wang, Kairong Xiao, Xiao-Yang Liu Yanglet, Li Deng

TL;DR

This work presents an Open FinLLM Leaderboard to address the need for robust, open benchmarks for FinLLMs and FinAgents in finance. It describes a collaborative platform with Linux Foundation and Hugging Face that evaluates multimodal, finance-focused tasks through a zero-shot testing pipeline on 42 datasets across seven categories, normalizing scores on a 0–100 scale via $\overline{S} = \frac{S - \min}{\max - \min} \times 100$. The system includes demonstrations like the FinGPT Search Agent, a zero-knowledge proof based privacy layer, and application-specific leaderboards to illustrate practical deployment and readiness. By combining transparent metrics, side-by-side qualitative comparisons, and privacy-preserving verification, the leaderboard aims to accelerate financial AI readiness for industry, regulators, and the public.

Abstract

Financial large language models (FinLLMs) with multimodal capabilities are envisioned to revolutionize applications across business, finance, accounting, and auditing. However, real-world adoption requires robust benchmarks of FinLLMs' and FinAgents' performance. Maintaining an open leaderboard is crucial for encouraging innovative adoption and improving model effectiveness. In collaboration with Linux Foundation and Hugging Face, we create an open FinLLM leaderboard, which serves as an open platform for assessing and comparing AI models' performance on a wide spectrum of financial tasks. By demoncratizing access to advances of financial knowledge and intelligence, a chatbot or agent may enhance the analytical capabilities of the general public to a professional level within a few months of usage. This open leaderboard welcomes contributions from academia, open-source community, industry, and stakeholders. In particular, we encourage contributions of new datasets, tasks, and models for continual update. Through fostering a collaborative and open ecosystem, we seek to promote financial AI readiness.

Open FinLLM Leaderboard: Towards Financial AI Readiness

TL;DR

. The system includes demonstrations like the FinGPT Search Agent, a zero-knowledge proof based privacy layer, and application-specific leaderboards to illustrate practical deployment and readiness. By combining transparent metrics, side-by-side qualitative comparisons, and privacy-preserving verification, the leaderboard aims to accelerate financial AI readiness for industry, regulators, and the public.

Abstract

Paper Structure (52 sections, 24 figures, 4 tables)

This paper contains 52 sections, 24 figures, 4 tables.

Introduction
Related Works
Development of FinLLMs
Benchmarking FinLLMs
Open FinLLM Leaderboard
Overview
Financial Tasks with Multimodal Data
Testing Pipeline
Structure
Demos and Use Scenarios
App Demo: Search Agent
Web Demo
Zero-Knowledge Proof
Use Scenarios
Refining Questions for Legal Consultations
...and 37 more sections

Figures (24)

Figure 1: A screenshot of the open FinLLM leaderborad. The top $11$ models are ranked across $7$ financial tasks.
Figure 2: Example of task selection, allowing users to browse tasks under different financial categories.
Figure 3: Testing pipeline currently used in the FinLLM leaderboard.
Figure 4: Demo of FinGPT Search Agent Felix2024FinGPTAgent: users could check information sources.
Figure 5: Global performance overview showing weighted average scores across all scenarios
...and 19 more figures

Open FinLLM Leaderboard: Towards Financial AI Readiness

TL;DR

Abstract

Open FinLLM Leaderboard: Towards Financial AI Readiness

Authors

TL;DR

Abstract

Table of Contents

Figures (24)