Table of Contents
Fetching ...

LAET: A Layer-wise Adaptive Ensemble Tuning Framework for Pretrained Language Models

Jawad Ibn Ahad, Muhammad Rafsan Kabir, Robin Krambroeckers, Sifat Momen, Nabeel Mohammed, Shafin Rahman

TL;DR

LAET tackles the heavy compute barrier of domain-specific LLMs in finance by identifying and fine-tuning only the most impactful layers through per-layer probing, then aggregating predictions via voting. The approach reduces training cost while delivering competitive or superior results across 23 finance-focused datasets spanning textual analysis, forecasting, and risk management, against strong baselines including GPT-4. The findings show that small, carefully tuned LLMs can rival larger models when guided by layer-wise relevance and ensemble decision-making, with substantial layer-reduction (up to 60%) without sacrificing accuracy. The work provides practical insights into layer usefulness, representation choice (last token), and a scalable pipeline for efficient financial NLP deployment across domains.

Abstract

Natural Language Processing (NLP) has transformed the financial industry, enabling advancements in areas such as textual analysis, risk management, and forecasting. Large language models (LLMs) like BloombergGPT and FinMA have set new benchmarks across various financial NLP tasks, including sentiment analysis, stock movement prediction, and credit risk assessment. Furthermore, FinMA-ES, a bilingual financial LLM, has also demonstrated strong performance using the FLARE and FLARE-ES benchmarks. However, the high computational demands of these models limit the accessibility of many organizations. To address this, we propose Layer-wise Adaptive Ensemble Tuning (LAET), a novel strategy that selectively fine-tunes the most effective layers of pre-trained LLMs by analyzing hidden state representations while freezing less critical layers. LAET significantly reduces computational overhead while enhancing task-specific performance. Our approach shows strong results in financial NLP tasks, outperforming existing benchmarks and state-of-the-art LLMs such as GPT-4, even with smaller LLMs ($\sim$3B parameters). This work bridges cutting-edge financial NLP research and real-world deployment with efficient and scalable models for financial applications.

LAET: A Layer-wise Adaptive Ensemble Tuning Framework for Pretrained Language Models

TL;DR

LAET tackles the heavy compute barrier of domain-specific LLMs in finance by identifying and fine-tuning only the most impactful layers through per-layer probing, then aggregating predictions via voting. The approach reduces training cost while delivering competitive or superior results across 23 finance-focused datasets spanning textual analysis, forecasting, and risk management, against strong baselines including GPT-4. The findings show that small, carefully tuned LLMs can rival larger models when guided by layer-wise relevance and ensemble decision-making, with substantial layer-reduction (up to 60%) without sacrificing accuracy. The work provides practical insights into layer usefulness, representation choice (last token), and a scalable pipeline for efficient financial NLP deployment across domains.

Abstract

Natural Language Processing (NLP) has transformed the financial industry, enabling advancements in areas such as textual analysis, risk management, and forecasting. Large language models (LLMs) like BloombergGPT and FinMA have set new benchmarks across various financial NLP tasks, including sentiment analysis, stock movement prediction, and credit risk assessment. Furthermore, FinMA-ES, a bilingual financial LLM, has also demonstrated strong performance using the FLARE and FLARE-ES benchmarks. However, the high computational demands of these models limit the accessibility of many organizations. To address this, we propose Layer-wise Adaptive Ensemble Tuning (LAET), a novel strategy that selectively fine-tunes the most effective layers of pre-trained LLMs by analyzing hidden state representations while freezing less critical layers. LAET significantly reduces computational overhead while enhancing task-specific performance. Our approach shows strong results in financial NLP tasks, outperforming existing benchmarks and state-of-the-art LLMs such as GPT-4, even with smaller LLMs (3B parameters). This work bridges cutting-edge financial NLP research and real-world deployment with efficient and scalable models for financial applications.

Paper Structure

This paper contains 12 sections, 15 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: Comparison of full fine-tuning, parameter-efficient fine-tuning (PEFT), and our proposed Layer-wise Adaptive Ensemble Tuning (LAET). LAET identifies and updates only the most effective layers while keeping all remaining layers frozen. This substantially reduces computational overhead compared to full fine-tuning. At the same time, it achieves higher performance than conventional PEFT approaches (eg, LoRA, DoRA, etc), which tune only small adapter modules and later merge their weights.
  • Figure 2: LAET: Initially, (a) for each layer $l$ in a pre-trained model $\mathcal{M}$, representations $\mathbf{r}_i^{(l)}$ are extracted, logits $z_i^l$ are computed using the classifier $\mathcal{F}_\phi$, and a cross-entropy loss $\mathcal{L}_l$ is calculated. Next, (b) the best-performing layers are selected based on evaluation metrics and their deviations from the maximum values. Layers $l \notin\mathcal{B}$ are frozen, while the selected layers and shared classifier are fine-tuned. During inference, a voting-based ensemble strategy aggregates predictions from the selected layers to generate the final output.
  • Figure 3: Layer-wise probing: We evaluated three probing strategies: Last Token (LT), Sum of All Tokens (SaT), and Average of All Tokens (AvT). The results indicate that LT consistently outperforms both SaT and AvT in accuracy, making it the most reliable indicator of layer effectiveness. This observation motivates our choice of the LT strategy in the LAET implementation.
  • Figure 4: The 1st Std method selects 25–30 layers, while the proposed method selects 15–22, reducing layers by 20–40% while maintaining accuracy above 0.9, proving its efficiency.