Table of Contents
Fetching ...

A Spatio-Temporal Machine Learning Model for Mortgage Credit Risk: Default Probabilities and Loan Portfolios

Pascal Kündig, Fabio Sigrist

TL;DR

The paper introduces a novel framework that blends tree-boosting with a latent spatio-temporal Gaussian process to model mortgage credit risk, capturing non-linear predictor effects and unobserved spatio-temporal frailty. By applying LaGaBoost with Vecchia-Laplace approximations to Freddie Mac mortgage data, it demonstrates improved accuracy for both individual default probabilities and loan portfolio loss distributions compared with traditional linear hazard and linear spatio-temporal models. Interpretability analyses via SHAP reveal strong interactions and nonlinear effects among predictors, as well as meaningful spatio-temporal frailty patterns. The work provides practical gains for risk forecasting and portfolio risk management, while highlighting areas for further refinement in spatio-temporal covariance choices and data granularity.

Abstract

We introduce a novel machine learning model for credit risk by combining tree-boosting with a latent spatio-temporal Gaussian process model accounting for frailty correlation. This allows for modeling non-linearities and interactions among predictor variables in a flexible data-driven manner and for accounting for spatio-temporal variation that is not explained by observable predictor variables. We also show how estimation and prediction can be done in a computationally efficient manner. In an application to a large U.S. mortgage credit risk data set, we find that both predictive default probabilities for individual loans and predictive loan portfolio loss distributions obtained with our novel approach are more accurate compared to conventional independent linear hazard models and also linear spatio-temporal models. Using interpretability tools for machine learning models, we find that the likely reasons for this outperformance are strong interaction and non-linear effects in the predictor variables and the presence of spatio-temporal frailty effects.

A Spatio-Temporal Machine Learning Model for Mortgage Credit Risk: Default Probabilities and Loan Portfolios

TL;DR

The paper introduces a novel framework that blends tree-boosting with a latent spatio-temporal Gaussian process to model mortgage credit risk, capturing non-linear predictor effects and unobserved spatio-temporal frailty. By applying LaGaBoost with Vecchia-Laplace approximations to Freddie Mac mortgage data, it demonstrates improved accuracy for both individual default probabilities and loan portfolio loss distributions compared with traditional linear hazard and linear spatio-temporal models. Interpretability analyses via SHAP reveal strong interactions and nonlinear effects among predictors, as well as meaningful spatio-temporal frailty patterns. The work provides practical gains for risk forecasting and portfolio risk management, while highlighting areas for further refinement in spatio-temporal covariance choices and data granularity.

Abstract

We introduce a novel machine learning model for credit risk by combining tree-boosting with a latent spatio-temporal Gaussian process model accounting for frailty correlation. This allows for modeling non-linearities and interactions among predictor variables in a flexible data-driven manner and for accounting for spatio-temporal variation that is not explained by observable predictor variables. We also show how estimation and prediction can be done in a computationally efficient manner. In an application to a large U.S. mortgage credit risk data set, we find that both predictive default probabilities for individual loans and predictive loan portfolio loss distributions obtained with our novel approach are more accurate compared to conventional independent linear hazard models and also linear spatio-temporal models. Using interpretability tools for machine learning models, we find that the likely reasons for this outperformance are strong interaction and non-linear effects in the predictor variables and the presence of spatio-temporal frailty effects.
Paper Structure (23 sections, 14 equations, 15 figures, 8 tables, 1 algorithm)

This paper contains 23 sections, 14 equations, 15 figures, 8 tables, 1 algorithm.

Figures (15)

  • Figure 1: Number of defaults and default rate over time.
  • Figure 2: Spatial default rates. No data is available for the gray areas.
  • Figure 3: Temporal out-of-sample test area under the receiver operating characteristic curve (AUC) (higher = better).
  • Figure 4: Differences between means of the predictive loss distributions and realized portfolio losses.
  • Figure 5: Predictive $99\%$ quantiles of one-year-ahead loan portfolio loss distributions.
  • ...and 10 more figures