A Novel Methodology in Credit Spread Prediction Based on Ensemble Learning and Feature Selection

Yu Shao; Jiawen Bai; Yingze Hou; Xia'an Zhou; Zhanhao Pan

A Novel Methodology in Credit Spread Prediction Based on Ensemble Learning and Feature Selection

Yu Shao, Jiawen Bai, Yingze Hou, Xia'an Zhou, Zhanhao Pan

TL;DR

The paper tackles the challenge of predicting credit spreads for investment-grade bonds by introducing a stacking ensemble framework augmented with mutual-information-based feature selection. It constructs a 34-feature pool, reduces it to 20 informative features via $I(X;Y)=h(X)-h(X|Y)$, and employs a two-layer model with base learners (MLP, Random Forest, K-NN) and a Kernel Ridge meta-learner, along with PCA whitening and inclusion of the recent average spread. Empirical results show that this combination yields superior predictive accuracy on held-out data, with the stacked model outperforming individual learners; a near-term forecast for February 2019 is provided (approximately 73 bps) and validated by subsequent observations, which exhibit errors under 15 bps. The work demonstrates that mutual-information feature selection enhances robustness and interpretability by highlighting key drivers such as long-term yields, GDP, volatility indices, and macro indicators, offering actionable insights for fixed-income trading decisions.

Abstract

The credit spread is a key indicator in bond investments, offering valuable insights for fixed-income investors to devise effective trading strategies. This study proposes a novel credit spread forecasting model leveraging ensemble learning techniques. To enhance predictive accuracy, a feature selection method based on mutual information is incorporated. Empirical results demonstrate that the proposed methodology delivers superior accuracy in credit spread predictions. Additionally, we present a forecast of future credit spread trends using current data, providing actionable insights for investment decision-making.

A Novel Methodology in Credit Spread Prediction Based on Ensemble Learning and Feature Selection

TL;DR

, and employs a two-layer model with base learners (MLP, Random Forest, K-NN) and a Kernel Ridge meta-learner, along with PCA whitening and inclusion of the recent average spread. Empirical results show that this combination yields superior predictive accuracy on held-out data, with the stacked model outperforming individual learners; a near-term forecast for February 2019 is provided (approximately 73 bps) and validated by subsequent observations, which exhibit errors under 15 bps. The work demonstrates that mutual-information feature selection enhances robustness and interpretability by highlighting key drivers such as long-term yields, GDP, volatility indices, and macro indicators, offering actionable insights for fixed-income trading decisions.

A Novel Methodology in Credit Spread Prediction Based on Ensemble Learning and Feature Selection

TL;DR

Abstract

A Novel Methodology in Credit Spread Prediction Based on Ensemble Learning and Feature Selection

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)