Table of Contents
Fetching ...

A Report on Financial Regulations Challenge at COLING 2025

Keyi Wang, Jaisal Patel, Charlie Shen, Daniel Kim, Andy Zhu, Alex Lin, Luca Borella, Cailean Osborne, Matt White, Steve Yang, Kairong Xiao, Xiao-Yang Liu Yanglet

TL;DR

The paper presents the Regulations Challenge at COLING 2025, a comprehensive benchmark to evaluate financial large language models (FinLLMs) on regulatory compliance across nine tasks spanning information retrieval, ethics-certification, the Common Domain Model (CDM), the Model Openness Framework (MOF), and XBRL analytics. It reports on task design, data sources, participant methods, and evaluation results from 25 registered teams, highlighting that larger models with reasoning enhancements (e.g., GPT-4o, Mistral Large 2) tend to perform better, while fundamental weaknesses persist in abbreviation recognition and link retrieval. Key contributions include a detailed task suite, baselines, and insightful analysis of strengths and weaknesses across regulatory-domain NLP challenges, offering practical guidance for building regulation-aware FinLLMs. The work also outlines plans for ongoing challenges and leaderboard integration to advance practical readiness and reliability of FinLLMs in regulated finance applications.

Abstract

Financial large language models (FinLLMs) have been applied to various tasks in business, finance, accounting, and auditing. Complex financial regulations and standards are critical to financial services, which LLMs must comply with. However, FinLLMs' performance in understanding and interpreting financial regulations has rarely been studied. Therefore, we organize the Regulations Challenge, a shared task at COLING 2025. It encourages the academic community to explore the strengths and limitations of popular LLMs. We create 9 novel tasks and corresponding question sets. In this paper, we provide an overview of these tasks and summarize participants' approaches and results. We aim to raise awareness of FinLLMs' professional capability in financial regulations.

A Report on Financial Regulations Challenge at COLING 2025

TL;DR

The paper presents the Regulations Challenge at COLING 2025, a comprehensive benchmark to evaluate financial large language models (FinLLMs) on regulatory compliance across nine tasks spanning information retrieval, ethics-certification, the Common Domain Model (CDM), the Model Openness Framework (MOF), and XBRL analytics. It reports on task design, data sources, participant methods, and evaluation results from 25 registered teams, highlighting that larger models with reasoning enhancements (e.g., GPT-4o, Mistral Large 2) tend to perform better, while fundamental weaknesses persist in abbreviation recognition and link retrieval. Key contributions include a detailed task suite, baselines, and insightful analysis of strengths and weaknesses across regulatory-domain NLP challenges, offering practical guidance for building regulation-aware FinLLMs. The work also outlines plans for ongoing challenges and leaderboard integration to advance practical readiness and reliability of FinLLMs in regulated finance applications.

Abstract

Financial large language models (FinLLMs) have been applied to various tasks in business, finance, accounting, and auditing. Complex financial regulations and standards are critical to financial services, which LLMs must comply with. However, FinLLMs' performance in understanding and interpreting financial regulations has rarely been studied. Therefore, we organize the Regulations Challenge, a shared task at COLING 2025. It encourages the academic community to explore the strengths and limitations of popular LLMs. We create 9 novel tasks and corresponding question sets. In this paper, we provide an overview of these tasks and summarize participants' approaches and results. We aim to raise awareness of FinLLMs' professional capability in financial regulations.

Paper Structure

This paper contains 13 sections, 4 tables.