Table of Contents
Fetching ...

Enhancing Financial Market Predictions: Causality-Driven Feature Selection

Wenhao Liang, Zhengyang Li, Weitong Chen

TL;DR

The paper tackles the challenge of predicting financial market volatility by leveraging sentiment from financial news while ensuring reliability through calibration. It introduces FinSen, a temporally rich dataset spanning 2007–2023 with about 160K sentiment-annotated records across 197 countries, and shows a causal link between sentiment and volatility using Granger tests. By integrating causal-validated sentiment into a multi-layer LSTM and pairing it with a DAN3 text classifier trained with a Focal Calibration Loss, the work achieves improved forecast accuracy and substantially better calibration, with an ECE around 3.34%. The findings suggest that calibrated, causally-informed sentiment signals can yield more trustworthy probabilistic forecasts, offering practical value for risk management and financial decision-making.

Abstract

This paper introduces the FinSen dataset that revolutionizes financial market analysis by integrating economic and financial news articles from 197 countries with stock market data. The dataset's extensive coverage spans 15 years from 2007 to 2023 with temporal information, offering a rich, global perspective with 160,000 records on financial market news. Our study leverages causally validated sentiment scores and LSTM models to enhance market forecast accuracy and reliability. Utilizing the FinSen dataset, we introduce an innovative Focal Calibration Loss, reducing Expected Calibration Error (ECE) to 3.34 percent with the DAN 3 model. This not only improves prediction accuracy but also aligns probabilistic forecasts closely with real outcomes, crucial for the financial sector where predicted probability is paramount. Our approach demonstrates the effectiveness of combining sentiment analysis with precise calibration techniques for trustworthy financial forecasting where the cost of misinterpretation can be high. Finsen Data can be found at [this github URL](https://github.com/EagleAdelaide/FinSen_Dataset.git).

Enhancing Financial Market Predictions: Causality-Driven Feature Selection

TL;DR

The paper tackles the challenge of predicting financial market volatility by leveraging sentiment from financial news while ensuring reliability through calibration. It introduces FinSen, a temporally rich dataset spanning 2007–2023 with about 160K sentiment-annotated records across 197 countries, and shows a causal link between sentiment and volatility using Granger tests. By integrating causal-validated sentiment into a multi-layer LSTM and pairing it with a DAN3 text classifier trained with a Focal Calibration Loss, the work achieves improved forecast accuracy and substantially better calibration, with an ECE around 3.34%. The findings suggest that calibrated, causally-informed sentiment signals can yield more trustworthy probabilistic forecasts, offering practical value for risk management and financial decision-making.

Abstract

This paper introduces the FinSen dataset that revolutionizes financial market analysis by integrating economic and financial news articles from 197 countries with stock market data. The dataset's extensive coverage spans 15 years from 2007 to 2023 with temporal information, offering a rich, global perspective with 160,000 records on financial market news. Our study leverages causally validated sentiment scores and LSTM models to enhance market forecast accuracy and reliability. Utilizing the FinSen dataset, we introduce an innovative Focal Calibration Loss, reducing Expected Calibration Error (ECE) to 3.34 percent with the DAN 3 model. This not only improves prediction accuracy but also aligns probabilistic forecasts closely with real outcomes, crucial for the financial sector where predicted probability is paramount. Our approach demonstrates the effectiveness of combining sentiment analysis with precise calibration techniques for trustworthy financial forecasting where the cost of misinterpretation can be high. Finsen Data can be found at [this github URL](https://github.com/EagleAdelaide/FinSen_Dataset.git).
Paper Structure (15 sections, 3 equations, 6 figures, 5 tables)

This paper contains 15 sections, 3 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Framework of LSTM Volatility Prediction and DAN 3 Text Classification. $X_4$ is sentiment scores generated from FinSen dataset by FinBERT as the input of both models after causally validated with volatility $y$.
  • Figure 2: Granger Cause Test: $X$ is Aggregated Sentiment Scores and $y$ is Market Volatility. Yellow dashed lines highlight prominent peaks and valleys of alignment, selected subjectively to illustrate instances of notable alignment between features.
  • Figure 3: The Proposed LSTM Experimental Flow Chart
  • Figure 4: Granger Causality Test Result. Filtered $p-values$ < 0.05 on lags 1, 3, 7, 14, 30.
  • Figure 5: Volatility Prediction Comparison by LSTM Model. Red dash line is the prediction w/o sentiment score, while the blue dash line with it which is more closer with actual volatility (black line).
  • ...and 1 more figures