Fine-tuning of lightweight large language models for sentiment classification on heterogeneous financial textual data

Alvaro Paredes Amorin; Andre Python; Christoph Weisser

Fine-tuning of lightweight large language models for sentiment classification on heterogeneous financial textual data

Alvaro Paredes Amorin, Andre Python, Christoph Weisser

TL;DR

The paper evaluates lightweight open-source LLMs against FinBERT for financial sentiment analysis across English and Chinese sources, using a domain-balanced, PEFT-based fine-tuning pipeline with 4-bit quantization. It finds Qwen3-8B and Llama3-8B Instruct often outperform FinBERT, even with minimal training data, highlighting strong zero- and few-shot capabilities. A domain-balanced training approach improves cross-domain performance and reduces interference, suggesting lightweight LLMs as cost-effective options for heterogeneous financial text. The study also outlines practical guidance for low-resource settings and notes avenues for future multilingual and RAG-enabled extensions.

Abstract

Large language models (LLMs) play an increasingly important role in financial markets analysis by capturing signals from complex and heterogeneous textual data sources, such as tweets, news articles, reports, and microblogs. However, their performance is dependent on large computational resources and proprietary datasets, which are costly, restricted, and therefore inaccessible to many researchers and practitioners. To reflect realistic situations we investigate the ability of lightweight open-source LLMs -- smaller and publicly available models designed to operate with limited computational resources -- to generalize sentiment understanding from financial datasets of varying sizes, sources, formats, and languages. We compare the benchmark finance natural language processing (NLP) model, FinBERT, and three open-source lightweight LLMs, DeepSeek-LLM 7B, Llama3 8B Instruct, and Qwen3 8B on five publicly available datasets: FinancialPhraseBank, Financial Question Answering, Gold News Sentiment, Twitter Sentiment and Chinese Finance Sentiment. We find that LLMs, specially Qwen3 8B and Llama3 8B, perform best in most scenarios, even from using only 5% of the available training data. These results hold in zero-shot and few-shot learning scenarios. Our findings indicate that lightweight, open-source large language models (LLMs) constitute a cost-effective option, as they can achieve competitive performance on heterogeneous textual data even when trained on only a limited subset of the extensive annotated corpora that are typically deemed necessary.

Fine-tuning of lightweight large language models for sentiment classification on heterogeneous financial textual data

TL;DR

Abstract

Fine-tuning of lightweight large language models for sentiment classification on heterogeneous financial textual data

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)