Table of Contents
Fetching ...

Exploring Large Language Models for Financial Applications: Techniques, Performance, and Challenges with FinMA

Prudence Djagba, Abdelkader Y. Saley

TL;DR

This paper evaluates FinMA, a domain-adapted financial LLM within the PIXIU framework, using the FLARE benchmark to assess its capabilities across financial NLP and prediction tasks. It analyzes FinMA’s architecture and FinMA-specific instruction tuning via the FIT dataset, reporting strong performance in sentiment analysis and text classification but notable gaps in numerical reasoning, named entity recognition, and summarization. The study discusses the open-source nature and data/compute challenges of FinMA, as well as practical implications for finance workflows and the need for robust evaluation methodologies. It concludes with concrete directions for improving FinLLMs, including retrieval-augmented generation, targeted fine-tuning, and multimodal integration to better support finance decision-making.

Abstract

This research explores the strengths and weaknesses of domain-adapted Large Language Models (LLMs) in the context of financial natural language processing (NLP). The analysis centers on FinMA, a model created within the PIXIU framework, which is evaluated for its performance in specialized financial tasks. Recognizing the critical demands of accuracy, reliability, and domain adaptation in financial applications, this study examines FinMA's model architecture, its instruction tuning process utilizing the Financial Instruction Tuning (FIT) dataset, and its evaluation under the FLARE benchmark. Findings indicate that FinMA performs well in sentiment analysis and classification, but faces notable challenges in tasks involving numerical reasoning, entity recognition, and summarization. This work aims to advance the understanding of how financial LLMs can be effectively designed and evaluated to assist in finance-related decision-making processes.

Exploring Large Language Models for Financial Applications: Techniques, Performance, and Challenges with FinMA

TL;DR

This paper evaluates FinMA, a domain-adapted financial LLM within the PIXIU framework, using the FLARE benchmark to assess its capabilities across financial NLP and prediction tasks. It analyzes FinMA’s architecture and FinMA-specific instruction tuning via the FIT dataset, reporting strong performance in sentiment analysis and text classification but notable gaps in numerical reasoning, named entity recognition, and summarization. The study discusses the open-source nature and data/compute challenges of FinMA, as well as practical implications for finance workflows and the need for robust evaluation methodologies. It concludes with concrete directions for improving FinLLMs, including retrieval-augmented generation, targeted fine-tuning, and multimodal integration to better support finance decision-making.

Abstract

This research explores the strengths and weaknesses of domain-adapted Large Language Models (LLMs) in the context of financial natural language processing (NLP). The analysis centers on FinMA, a model created within the PIXIU framework, which is evaluated for its performance in specialized financial tasks. Recognizing the critical demands of accuracy, reliability, and domain adaptation in financial applications, this study examines FinMA's model architecture, its instruction tuning process utilizing the Financial Instruction Tuning (FIT) dataset, and its evaluation under the FLARE benchmark. Findings indicate that FinMA performs well in sentiment analysis and classification, but faces notable challenges in tasks involving numerical reasoning, entity recognition, and summarization. This work aims to advance the understanding of how financial LLMs can be effectively designed and evaluated to assist in finance-related decision-making processes.

Paper Structure

This paper contains 46 sections, 6 figures, 11 tables.

Figures (6)

  • Figure 1: Chronological progression of notable pre-trained language models and large language models from general-purpose applications to finance-specific implementations. [Source: Lee2024survey]
  • Figure 2: Timeline of key developments in financial natural language processing from 2017 to 2025.
  • Figure 3: Overview of financial NLP tasks and representative datasets for FinLLM evaluation, adapted from Chen2024. Under-explored tasks are highlighted in yellow.
  • Figure 4: Instruction tuning pipeline for financial datasets Lee2024survey.
  • Figure 5: Zero-shot and Few-shot F1 Scores for Sentiment Analysis (FiQA-SA, FPB), News Headline Classification (Headlines), and Named Entity Recognition (Financial NER).
  • ...and 1 more figures