Table of Contents
Fetching ...

SPEAR-MM: Selective Parameter Evaluation and Restoration via Model Merging for Efficient Financial LLM Adaptation

Berkcan Kapusuzoglu, Supriyo Chakraborty, Renkun Ni, Stephen Rawls, Sambit Sahu

TL;DR

This work addresses catastrophic forgetting during financial-domain adaptation of large language models by introducing SPEAR-MM, a post-hoc framework that estimates layer-wise importance and selectively preserves or restores parameters via SLERP-based merging. The approach combines SNR-weighted change and singular value drop metrics to rank parameters, enabling three restoration policies (Conservative, Balanced, Aggressive) and avoiding full retraining. Empirical results on LLaMA-3.1-8B show SPEAR-MM achieves high general capability retention (up to 91.2% on average) while preserving most domain adaptation gains (≈94%), with substantial computational savings (~99% fewer GPU-hours per configuration). The method offers interpretable trade-offs suitable for regulated financial environments, enabling efficient, secure, and flexible deployment of domain-custom LLMs.

Abstract

Large language models (LLMs) adapted to financial domains often suffer from catastrophic forgetting of general reasoning capabilities essential for customer interactions and complex financial analysis. We introduce Selective Parameter Evaluation and Restoration via Model Merging (SPEAR-MM), a practical framework that preserves critical capabilities while enabling domain adaptation. Our method approximates layer-wise impact on external benchmarks through post-hoc analysis, then selectively freezes or restores transformer layers via spherical interpolation merging. Applied to LLaMA-3.1-8B for financial tasks, SPEAR-MM achieves 91.2% retention of general capabilities versus 69.7% for standard continual pretraining, while maintaining 94% of domain adaptation gains. The approach provides interpretable trade-off control and reduces computational costs by 90% crucial for resource-constrained financial institutions.

SPEAR-MM: Selective Parameter Evaluation and Restoration via Model Merging for Efficient Financial LLM Adaptation

TL;DR

This work addresses catastrophic forgetting during financial-domain adaptation of large language models by introducing SPEAR-MM, a post-hoc framework that estimates layer-wise importance and selectively preserves or restores parameters via SLERP-based merging. The approach combines SNR-weighted change and singular value drop metrics to rank parameters, enabling three restoration policies (Conservative, Balanced, Aggressive) and avoiding full retraining. Empirical results on LLaMA-3.1-8B show SPEAR-MM achieves high general capability retention (up to 91.2% on average) while preserving most domain adaptation gains (≈94%), with substantial computational savings (~99% fewer GPU-hours per configuration). The method offers interpretable trade-offs suitable for regulated financial environments, enabling efficient, secure, and flexible deployment of domain-custom LLMs.

Abstract

Large language models (LLMs) adapted to financial domains often suffer from catastrophic forgetting of general reasoning capabilities essential for customer interactions and complex financial analysis. We introduce Selective Parameter Evaluation and Restoration via Model Merging (SPEAR-MM), a practical framework that preserves critical capabilities while enabling domain adaptation. Our method approximates layer-wise impact on external benchmarks through post-hoc analysis, then selectively freezes or restores transformer layers via spherical interpolation merging. Applied to LLaMA-3.1-8B for financial tasks, SPEAR-MM achieves 91.2% retention of general capabilities versus 69.7% for standard continual pretraining, while maintaining 94% of domain adaptation gains. The approach provides interpretable trade-off control and reduces computational costs by 90% crucial for resource-constrained financial institutions.

Paper Structure

This paper contains 17 sections, 1 equation, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: The $\textcolor{black}{SPEAR-MM}$ Pipeline: (1) A base model is fine-tuned on financial data to create an adapted model. (2) Layer-wise $\textcolor{black}{SPEAR-MM}$ scores are computed using SNR and change metrics from both models. (3) Parameters are grouped and ranked by their score. (4) A merge configuration is generated to create the final model, which is then (5) evaluated on validation sets to select the optimal configuration.
  • Figure 2: Domain Adaptation vs. Knowledge Retention Trade-off. The y-axis shows domain performance as a percentage of a non-adapted baseline. Our method enables precise control over the forgetting-learning balance, with each point representing a different freezing configuration. The resulting trade-off frontier is superior to that of the baselines.
  • Figure 3: Heatmap of normalized impact scores for each model component across 32 layers. White dots mark the top 50% of impactful layers within each component. The visualization highlights a distinct pattern in the MLP blocks (high impact at the network's boundaries) and a progressive increase in importance for the attention mechanism's value projection ($v_{proj}$) in deeper layers.