Table of Contents
Fetching ...

Sentiment-driven prediction of financial returns: a Bayesian-enhanced FinBERT approach

Raffaele Giuseppe Cestari, Simone Formentin

TL;DR

This study showcases the efficacy of leveraging sentiment information extracted from tweets using the FinBERT large language model, achieving an F1-score exceeding 70% on the test set and translating into demonstrably higher cumulative profits during backtested trading.

Abstract

Predicting financial returns accurately poses a significant challenge due to the inherent uncertainty in financial time series data. Enhancing prediction models' performance hinges on effectively capturing both social and financial sentiment. In this study, we showcase the efficacy of leveraging sentiment information extracted from tweets using the FinBERT large language model. By meticulously curating an optimal feature set through correlation analysis and employing Bayesian-optimized Recursive Feature Elimination for automatic feature selection, we surpass existing methodologies, achieving an F1-score exceeding 70% on the test set. This success translates into demonstrably higher cumulative profits during backtested trading. Our investigation focuses on real-world SPY ETF data alongside corresponding tweets sourced from the StockTwits platform.

Sentiment-driven prediction of financial returns: a Bayesian-enhanced FinBERT approach

TL;DR

This study showcases the efficacy of leveraging sentiment information extracted from tweets using the FinBERT large language model, achieving an F1-score exceeding 70% on the test set and translating into demonstrably higher cumulative profits during backtested trading.

Abstract

Predicting financial returns accurately poses a significant challenge due to the inherent uncertainty in financial time series data. Enhancing prediction models' performance hinges on effectively capturing both social and financial sentiment. In this study, we showcase the efficacy of leveraging sentiment information extracted from tweets using the FinBERT large language model. By meticulously curating an optimal feature set through correlation analysis and employing Bayesian-optimized Recursive Feature Elimination for automatic feature selection, we surpass existing methodologies, achieving an F1-score exceeding 70% on the test set. This success translates into demonstrably higher cumulative profits during backtested trading. Our investigation focuses on real-world SPY ETF data alongside corresponding tweets sourced from the StockTwits platform.
Paper Structure (10 sections, 6 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 10 sections, 6 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: FinBERT sentiment tweets classification.
  • Figure 2: Upper panel: FinBERT negative sentiment vs return Pearson correlation coefficient for different time lags. Lower panel: Pearson autocorrelation function.
  • Figure 3: F1-score on $8$ batches of $10$ trading days each in test set.
  • Figure 4: Upper panel: cumulative profit time series in test set. Lower panel: SPY return.