Enhancing Financial Domain Adaptation of Language Models via Model Augmentation

Kota Tanabe; Masanori Hirano; Kazuki Matoya; Kentaro Imajo; Hiroki Sakaji; Itsuki Noda

Enhancing Financial Domain Adaptation of Language Models via Model Augmentation

Kota Tanabe, Masanori Hirano, Kazuki Matoya, Kentaro Imajo, Hiroki Sakaji, Itsuki Noda

TL;DR

The paper addresses financial-domain adaptation of large language models by proposing CALM, a cross-attention-based model augmentation that links an anchor LLM with a finance-specialized augmenting LLM while freezing the base models. It defines the cross-attention mechanism via $f_{cross}(H_{A_i}, H_{B_j})$ and the fused representation $H_{A_i \\oplus B_j} = H_{B_j} + f_{cross}(\cdot)$, enabling selective integration of financial knowledge without full re-training. Experiments pair nekomata-14b-instruction (anchor) with nekomata-14b-pfn-qfin (augmenting) and train CALM on a Japanese Financial Instruction Dataset, showing CALM outperforms the individual models and LoRA on a 360-dialogue, 12-task benchmark; connecting at the middle layers yields the strongest gains. The work demonstrates CALM's practicality for finance-focused dialogue systems, offering robust adaptation even when CALM's training data differs from the augmenting-model data, and suggests promising directions for optimizing connection points in LLM ensembles. These findings hold potential for industry applications requiring reliable financial reasoning and generation without extensive re-training of large models.

Abstract

The domain adaptation of language models, including large language models (LLMs), has become increasingly important as the use of such models continues to expand. This study demonstrates the effectiveness of Composition to Augment Language Models (CALM) in adapting to the financial domain. CALM is a model to extend the capabilities of existing models by introducing cross-attention between two LLMs with different functions. In our experiments, we developed a CALM to enhance the financial performance of an LLM with strong response capabilities by leveraging a financial-specialized LLM. Notably, the CALM was trained using a financial dataset different from the one used to train the financial-specialized LLM, confirming CALM's ability to adapt to various datasets. The models were evaluated through quantitative Japanese financial benchmarks and qualitative response comparisons, demonstrating that CALM enables superior responses with higher scores than the original models and baselines. Additionally, comparative experiments on connection points revealed that connecting the middle layers of the models is most effective in facilitating adaptation to the financial domain. These findings confirm that CALM is a practical approach for adapting LLMs to the financial domain.

Enhancing Financial Domain Adaptation of Language Models via Model Augmentation

TL;DR

Abstract

Enhancing Financial Domain Adaptation of Language Models via Model Augmentation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)