JaFIn: Japanese Financial Instruction Dataset
Kota Tanabe, Masahiro Suzuki, Hiroki Sakaji, Itsuki Noda
TL;DR
JaFIn addresses the scarcity of Japanese financial instruction data by constructing a 1,490-sample dataset from government and public sources to enable instruction tuning for finance-focused LLMs. The authors apply LoRA-based instruction tuning to multiple Japanese finance-oriented LLMs and evaluate them against the FinBen benchmark, observing model- and task-dependent gains that validate domain adaptation via instruction tuning in the Japanese finance domain. Qualitative assessments further support improved factual alignment and response quality post-tuning, while acknowledging biases and coverage limitations. The work offers a publicly releasable dataset to accelerate development of Japanese financial conversational AI and informs future directions in finance-specific pre-training and bias mitigation.
Abstract
We construct an instruction dataset for the large language model (LLM) in the Japanese finance domain. Domain adaptation of language models, including LLMs, is receiving more attention as language models become more popular. This study demonstrates the effectiveness of domain adaptation through instruction tuning. To achieve this, we propose an instruction tuning data in Japanese called JaFIn, the Japanese Financial Instruction Dataset. JaFIn is manually constructed based on multiple data sources, including Japanese government websites, which provide extensive financial knowledge. We then utilize JaFIn to apply instruction tuning for several LLMs, demonstrating that our models specialized in finance have better domain adaptability than the original models. The financial-specialized LLMs created were evaluated using a quantitative Japanese financial benchmark and qualitative response comparisons, showing improved performance over the originals.
