Table of Contents
Fetching ...

FINCH: Financial Intelligence using Natural language for Contextualized SQL Handling

Avinash Kumar Singh, Bhaskarjit Sarmah, Stefano Pasquali

TL;DR

FINCH addresses the scarcity of finance-specific Text-to-SQL benchmarks by introducing a large-scale finance-focused NL–SQL dataset and a finance-aware evaluation metric, the FINCH Score. It consolidates resources into 33 finance databases (292 tables, 2,233 columns, 177 relations) with 75,725 NL–SQL pairs, enabling both fine-tuning and robust evaluation. The study benchmarks a spectrum of models from large LLMs to reasoning-focused systems, finding that domain-specific fine-tuning can rival or exceed the performance of much larger models, and that the FINCH Score better captures practical financial correctness than traditional metrics. This work provides a foundation for reliable, domain-aware Text-to-SQL in finance and outlines future directions for multi-modal data and improved schema grounding.

Abstract

Text-to-SQL, the task of translating natural language questions into SQL queries, has long been a central challenge in NLP. While progress has been significant, applying it to the financial domain remains especially difficult due to complex schema, domain-specific terminology, and high stakes of error. Despite this, there is no dedicated large-scale financial dataset to advance research, creating a critical gap. To address this, we introduce a curated financial dataset (FINCH) comprising 292 tables and 75,725 natural language-SQL pairs, enabling both fine-tuning and rigorous evaluation. Building on this resource, we benchmark reasoning models and language models of varying scales, providing a systematic analysis of their strengths and limitations in financial Text-to-SQL tasks. Finally, we propose a finance-oriented evaluation metric (FINCH Score) that captures nuances overlooked by existing measures, offering a more faithful assessment of model performance.

FINCH: Financial Intelligence using Natural language for Contextualized SQL Handling

TL;DR

FINCH addresses the scarcity of finance-specific Text-to-SQL benchmarks by introducing a large-scale finance-focused NL–SQL dataset and a finance-aware evaluation metric, the FINCH Score. It consolidates resources into 33 finance databases (292 tables, 2,233 columns, 177 relations) with 75,725 NL–SQL pairs, enabling both fine-tuning and robust evaluation. The study benchmarks a spectrum of models from large LLMs to reasoning-focused systems, finding that domain-specific fine-tuning can rival or exceed the performance of much larger models, and that the FINCH Score better captures practical financial correctness than traditional metrics. This work provides a foundation for reliable, domain-aware Text-to-SQL in finance and outlines future directions for multi-modal data and improved schema grounding.

Abstract

Text-to-SQL, the task of translating natural language questions into SQL queries, has long been a central challenge in NLP. While progress has been significant, applying it to the financial domain remains especially difficult due to complex schema, domain-specific terminology, and high stakes of error. Despite this, there is no dedicated large-scale financial dataset to advance research, creating a critical gap. To address this, we introduce a curated financial dataset (FINCH) comprising 292 tables and 75,725 natural language-SQL pairs, enabling both fine-tuning and rigorous evaluation. Building on this resource, we benchmark reasoning models and language models of varying scales, providing a systematic analysis of their strengths and limitations in financial Text-to-SQL tasks. Finally, we propose a finance-oriented evaluation metric (FINCH Score) that captures nuances overlooked by existing measures, offering a more faithful assessment of model performance.

Paper Structure

This paper contains 10 sections, 4 equations, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Representation of the FINCH dataset showing the integration of different databases and tables across financial domains.