Table of Contents
Fetching ...

Realised Volatility Forecasting: Machine Learning via Financial Word Embedding

Eghbal Rahimikia, Stefan Zohren, Ser-Huang Poon

TL;DR

This work investigates whether news-driven embeddings can improve realised volatility forecasting and whether such signals are complementary to traditional HAR-family models. It compares specialised financial embeddings (FinText) against general-purpose embeddings across stock-related and general news, evaluating standalone NLP forecasts and ensembles with HAR benchmarks, plus explainability via SHAP. The findings show stock-related news offers meaningful predictive content, especially on high-volatility days, and that simple ensembles of NLP forecasts with HAR models deliver statistically and economically meaningful gains, with SHAP attributing forecasts to interpretable finance-relevant phrases. The results underscore the value of finance-focused word embeddings and offer a practical, transparent NLP framework for volatility forecasting with potential applications beyond RV.

Abstract

We examine whether news can improve realised volatility forecasting using a modern yet operationally simple NLP framework. News text is transformed into embedding-based representations, and forecasts are evaluated both as a standalone, news-only model and as a complement to standard realised volatility benchmarks. In out-of-sample tests on a cross-section of stocks, news contains useful predictive information, with stronger effects for stock-related content and during high volatility days. Combining the news-based signal with a leading benchmark yields consistent improvements in statistical performance and economically meaningful gains, while explainability analysis highlights the news themes most relevant for volatility.

Realised Volatility Forecasting: Machine Learning via Financial Word Embedding

TL;DR

This work investigates whether news-driven embeddings can improve realised volatility forecasting and whether such signals are complementary to traditional HAR-family models. It compares specialised financial embeddings (FinText) against general-purpose embeddings across stock-related and general news, evaluating standalone NLP forecasts and ensembles with HAR benchmarks, plus explainability via SHAP. The findings show stock-related news offers meaningful predictive content, especially on high-volatility days, and that simple ensembles of NLP forecasts with HAR models deliver statistically and economically meaningful gains, with SHAP attributing forecasts to interpretable finance-relevant phrases. The results underscore the value of finance-focused word embeddings and offer a practical, transparent NLP framework for volatility forecasting with potential applications beyond RV.

Abstract

We examine whether news can improve realised volatility forecasting using a modern yet operationally simple NLP framework. News text is transformed into embedding-based representations, and forecasts are evaluated both as a standalone, news-only model and as a complement to standard realised volatility benchmarks. In out-of-sample tests on a cross-section of stocks, news contains useful predictive information, with stronger effects for stock-related content and during high volatility days. Combining the news-based signal with a leading benchmark yields consistent improvements in statistical performance and economically meaningful gains, while explainability analysis highlights the news themes most relevant for volatility.

Paper Structure

This paper contains 29 sections, 37 equations, 9 figures, 11 tables.

Figures (9)

  • Figure 1: Monthly Corpus Sample and Word Count
  • Figure 2: An Abstract Representation of the NLP Model
  • Figure 3: A Detailed Representation of the NLP Model
  • Figure 4: Distribution of Daily Tokens
  • Figure 5: Out-of-Sample Word Cloud
  • ...and 4 more figures