Table of Contents
Fetching ...

FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis

Chao Zhang, Yuren Mao, Yijiang Fan, Yu Mi, Yunjun Gao, Lu Chen, Dongfang Lou, Jinshu Lin

TL;DR

FinSQL tackles the lack of practical financial Text-to-SQL benchmarks and the challenges of adapting open-source LLMs to wide financial schemas. It introduces BULL, a realistic finance-focused dataset, and FinSQL, a model-agnostic framework built around prompt construction, LoRA-based PEFT with a plugin hub, and output calibration. Key ideas include hybrid data augmentation, parallel Cross-Encoder schema linking, and a weights-merging strategy to enable few-shot cross-database transfer while preserving privacy via open-source models. Experiments on BULL demonstrate state-of-the-art performance and substantial gains in few-shot transfer, underscoring the framework’s practical impact for financial analysis and real-world deployment.

Abstract

Text-to-SQL, which provides zero-code interface for operating relational databases, has gained much attention in financial analysis; because, financial professionals may not well-skilled in SQL programming. However, until now, there is no practical Text-to-SQL benchmark dataset for financial analysis, and existing Text-to-SQL methods have not considered the unique characteristics of databases in financial applications, such as commonly existing wide tables. To address these issues, we collect a practical Text-to-SQL benchmark dataset and propose a model-agnostic Large Language Model (LLMs)-based Text-to-SQL framework for financial analysis. The benchmark dataset, BULL, is collected from the practical financial analysis business of Hundsun Technologies Inc., including databases for fund, stock, and macro economy. Besides, the proposed LLMs-based Text-to-SQL framework, FinSQL, provides a systematic treatment for financial Text-to-SQL from the perspectives of prompt construction, parameter-efficient fine-tuning and output calibration. Extensive experimental results on BULL demonstrate that FinSQL achieves the state-of-the-art Text-to-SQL performance at a small cost; furthermore, FinSQL can bring up to 36.64% performance improvement in scenarios requiring few-shot cross-database model transfer.

FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis

TL;DR

FinSQL tackles the lack of practical financial Text-to-SQL benchmarks and the challenges of adapting open-source LLMs to wide financial schemas. It introduces BULL, a realistic finance-focused dataset, and FinSQL, a model-agnostic framework built around prompt construction, LoRA-based PEFT with a plugin hub, and output calibration. Key ideas include hybrid data augmentation, parallel Cross-Encoder schema linking, and a weights-merging strategy to enable few-shot cross-database transfer while preserving privacy via open-source models. Experiments on BULL demonstrate state-of-the-art performance and substantial gains in few-shot transfer, underscoring the framework’s practical impact for financial analysis and real-world deployment.

Abstract

Text-to-SQL, which provides zero-code interface for operating relational databases, has gained much attention in financial analysis; because, financial professionals may not well-skilled in SQL programming. However, until now, there is no practical Text-to-SQL benchmark dataset for financial analysis, and existing Text-to-SQL methods have not considered the unique characteristics of databases in financial applications, such as commonly existing wide tables. To address these issues, we collect a practical Text-to-SQL benchmark dataset and propose a model-agnostic Large Language Model (LLMs)-based Text-to-SQL framework for financial analysis. The benchmark dataset, BULL, is collected from the practical financial analysis business of Hundsun Technologies Inc., including databases for fund, stock, and macro economy. Besides, the proposed LLMs-based Text-to-SQL framework, FinSQL, provides a systematic treatment for financial Text-to-SQL from the perspectives of prompt construction, parameter-efficient fine-tuning and output calibration. Extensive experimental results on BULL demonstrate that FinSQL achieves the state-of-the-art Text-to-SQL performance at a small cost; furthermore, FinSQL can bring up to 36.64% performance improvement in scenarios requiring few-shot cross-database model transfer.
Paper Structure (35 sections, 5 equations, 13 figures, 9 tables, 1 algorithm)

This paper contains 35 sections, 5 equations, 13 figures, 9 tables, 1 algorithm.

Figures (13)

  • Figure 1: The overview of FinSQL framework. In the training stage, the training data is first augmented with a hybrid data augmentation method. Then, the augmented data is used to train LoRA plugins which are used to handle various Text-to-SQL tasks. The LoRA plugins are managed by a LoRA plugin hub. In the inference stage, schema linking is firstly conducted to obtain concise prompt, and then the prompt inputs into a LLM model which consists of a base model and a LoRA module constructed with merged LoRA plugins. Finally, the output of the LLM model is calibrated to ensure the correctness of the model output. The ice and fire in the picture mean freezing and updating model weights respectively. The number in and indicate the process step of training and inference respectively.
  • Figure 2: The introduction of BULL databases. The English and Chinese versions of BULL share the same database structure. #Tab Num represents the number of tables in the database. #Avg Col and #Max Col mean the average and maximum number of columns in each table within the database.
  • Figure 3: An example of BULL dataset
  • Figure 4: The overview of CoT generation based on self-check
  • Figure 5: The CoT prompt template. The words in red are the input variables. Here we need to provide the question, schema information, golden SQL and one-shot example to fill the template.
  • ...and 8 more figures