Interpretable Machine Learning for Macro Alpha: A News Sentiment Case Study
Yuke Zhang
TL;DR
This work develops an interpretable macro-finance framework that converts global news sentiment into actionable next-day trading signals for FX and U.S. Treasury futures. Using FinBERT to score GDELT headlines and SHAP to interpret an XGBoost classifier, the authors construct a rich set of daily sentiment features and cross-asset market indicators, then evaluate with a rigorous expanding-window backtest. The results show strong out-of-sample performance with high risk-adjusted returns, and SHAP analyses confirm that sentiment dispersion and article impact are key drivers, supporting the feasibility of transparent macro alpha generation from alternative data. The approach advances reproducibility and interpretability in macro trading, offering a practical blueprint for deploying NLP-driven signals in regulated settings.
Abstract
This study introduces an interpretable machine learning (ML) framework to extract macroeconomic alpha from global news sentiment. We process the Global Database of Events, Language, and Tone (GDELT) Project's worldwide news feed using FinBERT -- a Bidirectional Encoder Representations from Transformers (BERT) based model pretrained on finance-specific language -- to construct daily sentiment indices incorporating mean tone, dispersion, and event impact. These indices drive an XGBoost classifier, benchmarked against logistic regression, to predict next-day returns for EUR/USD, USD/JPY, and 10-year U.S. Treasury futures (ZN). Rigorous out-of-sample (OOS) backtesting (5-fold expanding-window cross-validation, OOS period: c. 2017-April 2025) demonstrates exceptional, cost-adjusted performance for the XGBoost strategy: Sharpe ratios achieve 5.87 (EUR/USD), 4.65 (USD/JPY), and 4.65 (Treasuries), with respective compound annual growth rates (CAGRs) exceeding 50% in Foreign Exchange (FX) and 22% in bonds. Shapley Additive Explanations (SHAP) affirm that sentiment dispersion and article impact are key predictive features. Our findings establish that integrating domain-specific Natural Language Processing (NLP) with interpretable ML offers a potent and explainable source of macro alpha.
