Table of Contents
Fetching ...

Optimizing Performance: How Compact Models Match or Exceed GPT's Classification Capabilities through Fine-Tuning

Baptiste Lefort, Eric Benhamou, Jean-Jacques Ohana, David Saltiel, Beatrice Guez

Abstract

In this paper, we demonstrate that non-generative, small-sized models such as FinBERT and FinDRoBERTa, when fine-tuned, can outperform GPT-3.5 and GPT-4 models in zero-shot learning settings in sentiment analysis for financial news. These fine-tuned models show comparable results to GPT-3.5 when it is fine-tuned on the task of determining market sentiment from daily financial news summaries sourced from Bloomberg. To fine-tune and compare these models, we created a novel database, which assigns a market score to each piece of news without human interpretation bias, systematically identifying the mentioned companies and analyzing whether their stocks have gone up, down, or remained neutral. Furthermore, the paper shows that the assumptions of Condorcet's Jury Theorem do not hold suggesting that fine-tuned small models are not independent of the fine-tuned GPT models, indicating behavioural similarities. Lastly, the resulted fine-tuned models are made publicly available on HuggingFace, providing a resource for further research in financial sentiment analysis and text classification.

Optimizing Performance: How Compact Models Match or Exceed GPT's Classification Capabilities through Fine-Tuning

Abstract

In this paper, we demonstrate that non-generative, small-sized models such as FinBERT and FinDRoBERTa, when fine-tuned, can outperform GPT-3.5 and GPT-4 models in zero-shot learning settings in sentiment analysis for financial news. These fine-tuned models show comparable results to GPT-3.5 when it is fine-tuned on the task of determining market sentiment from daily financial news summaries sourced from Bloomberg. To fine-tune and compare these models, we created a novel database, which assigns a market score to each piece of news without human interpretation bias, systematically identifying the mentioned companies and analyzing whether their stocks have gone up, down, or remained neutral. Furthermore, the paper shows that the assumptions of Condorcet's Jury Theorem do not hold suggesting that fine-tuned small models are not independent of the fine-tuned GPT models, indicating behavioural similarities. Lastly, the resulted fine-tuned models are made publicly available on HuggingFace, providing a resource for further research in financial sentiment analysis and text classification.
Paper Structure (19 sections, 1 equation, 5 figures, 5 tables)

This paper contains 19 sections, 1 equation, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Time ordering of the automatic labelling steps. The headline is published at time $t$, the historical price distribution is gathered from $t-5$ years to $t$ and the next day price return is from $t$ to $t+1$.
  • Figure 2: Automatic Classification of Financial Headlines. Blue blocks represent data processing steps, green block represents decision points for classification, and orange blocks corresponding labels.
  • Figure 3: Full Process of Dataset Annotation.
  • Figure 4: Trackrecord of the labeling validity. The y-axis is the cumulative return of the strategies and the x-axis are the date of signal computation. The bigger the cumulative return is, the most efficient the classification is.
  • Figure 5: Confusion matrices for each LLM on their ability to classify specific categories.