Table of Contents
Fetching ...

Interpretable Machine Learning for Macro Alpha: A News Sentiment Case Study

Yuke Zhang

TL;DR

This work develops an interpretable macro-finance framework that converts global news sentiment into actionable next-day trading signals for FX and U.S. Treasury futures. Using FinBERT to score GDELT headlines and SHAP to interpret an XGBoost classifier, the authors construct a rich set of daily sentiment features and cross-asset market indicators, then evaluate with a rigorous expanding-window backtest. The results show strong out-of-sample performance with high risk-adjusted returns, and SHAP analyses confirm that sentiment dispersion and article impact are key drivers, supporting the feasibility of transparent macro alpha generation from alternative data. The approach advances reproducibility and interpretability in macro trading, offering a practical blueprint for deploying NLP-driven signals in regulated settings.

Abstract

This study introduces an interpretable machine learning (ML) framework to extract macroeconomic alpha from global news sentiment. We process the Global Database of Events, Language, and Tone (GDELT) Project's worldwide news feed using FinBERT -- a Bidirectional Encoder Representations from Transformers (BERT) based model pretrained on finance-specific language -- to construct daily sentiment indices incorporating mean tone, dispersion, and event impact. These indices drive an XGBoost classifier, benchmarked against logistic regression, to predict next-day returns for EUR/USD, USD/JPY, and 10-year U.S. Treasury futures (ZN). Rigorous out-of-sample (OOS) backtesting (5-fold expanding-window cross-validation, OOS period: c. 2017-April 2025) demonstrates exceptional, cost-adjusted performance for the XGBoost strategy: Sharpe ratios achieve 5.87 (EUR/USD), 4.65 (USD/JPY), and 4.65 (Treasuries), with respective compound annual growth rates (CAGRs) exceeding 50% in Foreign Exchange (FX) and 22% in bonds. Shapley Additive Explanations (SHAP) affirm that sentiment dispersion and article impact are key predictive features. Our findings establish that integrating domain-specific Natural Language Processing (NLP) with interpretable ML offers a potent and explainable source of macro alpha.

Interpretable Machine Learning for Macro Alpha: A News Sentiment Case Study

TL;DR

This work develops an interpretable macro-finance framework that converts global news sentiment into actionable next-day trading signals for FX and U.S. Treasury futures. Using FinBERT to score GDELT headlines and SHAP to interpret an XGBoost classifier, the authors construct a rich set of daily sentiment features and cross-asset market indicators, then evaluate with a rigorous expanding-window backtest. The results show strong out-of-sample performance with high risk-adjusted returns, and SHAP analyses confirm that sentiment dispersion and article impact are key drivers, supporting the feasibility of transparent macro alpha generation from alternative data. The approach advances reproducibility and interpretability in macro trading, offering a practical blueprint for deploying NLP-driven signals in regulated settings.

Abstract

This study introduces an interpretable machine learning (ML) framework to extract macroeconomic alpha from global news sentiment. We process the Global Database of Events, Language, and Tone (GDELT) Project's worldwide news feed using FinBERT -- a Bidirectional Encoder Representations from Transformers (BERT) based model pretrained on finance-specific language -- to construct daily sentiment indices incorporating mean tone, dispersion, and event impact. These indices drive an XGBoost classifier, benchmarked against logistic regression, to predict next-day returns for EUR/USD, USD/JPY, and 10-year U.S. Treasury futures (ZN). Rigorous out-of-sample (OOS) backtesting (5-fold expanding-window cross-validation, OOS period: c. 2017-April 2025) demonstrates exceptional, cost-adjusted performance for the XGBoost strategy: Sharpe ratios achieve 5.87 (EUR/USD), 4.65 (USD/JPY), and 4.65 (Treasuries), with respective compound annual growth rates (CAGRs) exceeding 50% in Foreign Exchange (FX) and 22% in bonds. Shapley Additive Explanations (SHAP) affirm that sentiment dispersion and article impact are key predictive features. Our findings establish that integrating domain-specific Natural Language Processing (NLP) with interpretable ML offers a potent and explainable source of macro alpha.

Paper Structure

This paper contains 16 sections, 2 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: SHAP summary plot for the EUR/USD XGBoost model. Each point on the plot represents a Shapley value for a feature and an instance (day). Features are ranked by the sum of absolute SHAP values across all instances (global importance), from top to bottom. The horizontal axis indicates the SHAP value (the impact on model output in log-odds space). Color illustrates the feature's value for that instance (red for high, blue for low). Points to the right of the zero line indicate the feature pushed the model towards predicting an upward move (positive class); points to the left indicate a push towards a downward move.