Table of Contents
Fetching ...

CFBenchmark: Chinese Financial Assistant Benchmark for Large Language Model

Yang Lei, Jiangtong Li, Dawei Cheng, Zhijun Ding, Changjun Jiang

TL;DR

This work introduces CFBenchmark, a Chinese financial assistant benchmark for large language models, designed to evaluate basic Chinese financial text processing across recognition, classification, and generation. The basic version (CFBenchmark-Basic) comprises 3917 texts (50–1800+ characters) from financial news and reports and uses zero-shot and few-shot evaluations on 22 LLMs, revealing substantial room for improvement in financial text understanding and generation. The study highlights task-specific strengths among models (e.g., Qwen-Chat-14B in recognition; Baichuan2-13B-Base in classification; Baichuan2-13B-Chat in generation) and demonstrates the efficacy of domain-adapted models like CFGPT variants. The authors propose a roadmap for an advanced CFBenchmark and release their code, aiming to drive progress in Chinese financial LLMs and practical financial AI assistants.

Abstract

Large language models (LLMs) have demonstrated great potential in the financial domain. Thus, it becomes important to assess the performance of LLMs in the financial tasks. In this work, we introduce CFBenchmark, to evaluate the performance of LLMs for Chinese financial assistant. The basic version of CFBenchmark is designed to evaluate the basic ability in Chinese financial text processing from three aspects~(\emph{i.e.} recognition, classification, and generation) including eight tasks, and includes financial texts ranging in length from 50 to over 1,800 characters. We conduct experiments on several LLMs available in the literature with CFBenchmark-Basic, and the experimental results indicate that while some LLMs show outstanding performance in specific tasks, overall, there is still significant room for improvement in basic tasks of financial text processing with existing models. In the future, we plan to explore the advanced version of CFBenchmark, aiming to further explore the extensive capabilities of language models in more profound dimensions as a financial assistant in Chinese. Our codes are released at https://github.com/TongjiFinLab/CFBenchmark.

CFBenchmark: Chinese Financial Assistant Benchmark for Large Language Model

TL;DR

This work introduces CFBenchmark, a Chinese financial assistant benchmark for large language models, designed to evaluate basic Chinese financial text processing across recognition, classification, and generation. The basic version (CFBenchmark-Basic) comprises 3917 texts (50–1800+ characters) from financial news and reports and uses zero-shot and few-shot evaluations on 22 LLMs, revealing substantial room for improvement in financial text understanding and generation. The study highlights task-specific strengths among models (e.g., Qwen-Chat-14B in recognition; Baichuan2-13B-Base in classification; Baichuan2-13B-Chat in generation) and demonstrates the efficacy of domain-adapted models like CFGPT variants. The authors propose a roadmap for an advanced CFBenchmark and release their code, aiming to drive progress in Chinese financial LLMs and practical financial AI assistants.

Abstract

Large language models (LLMs) have demonstrated great potential in the financial domain. Thus, it becomes important to assess the performance of LLMs in the financial tasks. In this work, we introduce CFBenchmark, to evaluate the performance of LLMs for Chinese financial assistant. The basic version of CFBenchmark is designed to evaluate the basic ability in Chinese financial text processing from three aspects~(\emph{i.e.} recognition, classification, and generation) including eight tasks, and includes financial texts ranging in length from 50 to over 1,800 characters. We conduct experiments on several LLMs available in the literature with CFBenchmark-Basic, and the experimental results indicate that while some LLMs show outstanding performance in specific tasks, overall, there is still significant room for improvement in basic tasks of financial text processing with existing models. In the future, we plan to explore the advanced version of CFBenchmark, aiming to further explore the extensive capabilities of language models in more profound dimensions as a financial assistant in Chinese. Our codes are released at https://github.com/TongjiFinLab/CFBenchmark.
Paper Structure (19 sections, 5 figures, 2 tables)

This paper contains 19 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Overview of CFBenchmark-Basic, assessing systems from financial entity recognition, financial text classification, and financial content generation.
  • Figure 2: An example of our question context. For each aspect of financial entity recognition, financial text classification, and financial content generation, a representative case is presented.
  • Figure 3: We illustrate the distribution of text lengths among these 3917 financial texts, where the length is calculated based on Chinese characters, starting from 50 and increasing in intervals of 300.
  • Figure :
  • Figure :