Table of Contents
Fetching ...

Kuaiji: the First Chinese Accounting Large Language Model

Jiayuan Luo, Songhua Yang, Xiaoling Qiu, Panyu Chen, Yufei Nai, Wenxuan Zeng, Wentao Zhang, Xinke Jiang

TL;DR

Kuaiji, a tailored Accounting Large Language Model, is introduced, meticulously fine-tuned using the Baichuan framework, which encompasses continuous pre-training and supervised fine-tuning processes.

Abstract

Large Language Models (LLMs) like ChatGPT and GPT-4 have demonstrated impressive proficiency in comprehending and generating natural language. However, they encounter difficulties when tasked with adapting to specialized domains such as accounting. To address this challenge, we introduce Kuaiji, a tailored Accounting Large Language Model. Kuaiji is meticulously fine-tuned using the Baichuan framework, which encompasses continuous pre-training and supervised fine-tuning processes. Supported by CAtAcctQA, a dataset containing large genuine accountant-client dialogues, Kuaiji exhibits exceptional accuracy and response speed. Our contributions encompass the creation of the first Chinese accounting dataset, the establishment of Kuaiji as a leading open-source Chinese accounting LLM, and the validation of its efficacy through real-world accounting scenarios.

Kuaiji: the First Chinese Accounting Large Language Model

TL;DR

Kuaiji, a tailored Accounting Large Language Model, is introduced, meticulously fine-tuned using the Baichuan framework, which encompasses continuous pre-training and supervised fine-tuning processes.

Abstract

Large Language Models (LLMs) like ChatGPT and GPT-4 have demonstrated impressive proficiency in comprehending and generating natural language. However, they encounter difficulties when tasked with adapting to specialized domains such as accounting. To address this challenge, we introduce Kuaiji, a tailored Accounting Large Language Model. Kuaiji is meticulously fine-tuned using the Baichuan framework, which encompasses continuous pre-training and supervised fine-tuning processes. Supported by CAtAcctQA, a dataset containing large genuine accountant-client dialogues, Kuaiji exhibits exceptional accuracy and response speed. Our contributions encompass the creation of the first Chinese accounting dataset, the establishment of Kuaiji as a leading open-source Chinese accounting LLM, and the validation of its efficacy through real-world accounting scenarios.
Paper Structure (17 sections, 2 equations, 5 figures, 6 tables)

This paper contains 17 sections, 2 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Statistics on the distribution of Kuaiji training dataset.
  • Figure 2: (Left.) SFT Dataset word count and log$_{10}$ frequency (Query). (Right.) SFT Dataset word count and log$_{10}$ frequency (Answer).
  • Figure 3: The overall flowchart of constructing Kuaiji.
  • Figure 4: Kuaiji sigle-turn Q&A test (in Chinese).
  • Figure 5: Kuaiji multi-turns Q&A test (in Chinese).