Table of Contents
Fetching ...

Data-Driven Dynamic Factor Modeling via Manifold Learning

Graeme Baker, Agostino Capponi, J. Antonio Sidaoui

TL;DR

A data-driven dynamic factor framework for modeling the joint evolution of high-dimensional covariates and responses without parametric assumptions is introduced, achieving mean absolute error improvements of up to 55% over classical scenario analysis and 39% over principal component analysis benchmarks.

Abstract

We introduce a data-driven dynamic factor framework for modeling the joint evolution of high-dimensional covariates and responses without parametric assumptions. Standard factor models applied to covariates alone often lose explanatory power for responses. Our approach uses anisotropic diffusion maps, a manifold learning technique, to learn low-dimensional embeddings that preserve both the intrinsic geometry of the covariates and the predictive relationship with responses. For time series arising from Langevin diffusions in Euclidean space, we show that the associated graph Laplacian converges to the generator of the underlying diffusion. We further establish a bound on the approximation error between the diffusion map coordinates and linear diffusion processes, and we show that ergodic averages in the embedding space converge under standard spectral assumptions. These results justify using Kalman filtering in diffusion-map coordinates for predicting joint covariate-response evolution. We apply this methodology to equity-portfolio stress testing using macroeconomic and financial variables from Federal Reserve supervisory scenarios, achieving mean absolute error improvements of up to 55% over classical scenario analysis and 39% over principal component analysis benchmarks.

Data-Driven Dynamic Factor Modeling via Manifold Learning

TL;DR

A data-driven dynamic factor framework for modeling the joint evolution of high-dimensional covariates and responses without parametric assumptions is introduced, achieving mean absolute error improvements of up to 55% over classical scenario analysis and 39% over principal component analysis benchmarks.

Abstract

We introduce a data-driven dynamic factor framework for modeling the joint evolution of high-dimensional covariates and responses without parametric assumptions. Standard factor models applied to covariates alone often lose explanatory power for responses. Our approach uses anisotropic diffusion maps, a manifold learning technique, to learn low-dimensional embeddings that preserve both the intrinsic geometry of the covariates and the predictive relationship with responses. For time series arising from Langevin diffusions in Euclidean space, we show that the associated graph Laplacian converges to the generator of the underlying diffusion. We further establish a bound on the approximation error between the diffusion map coordinates and linear diffusion processes, and we show that ergodic averages in the embedding space converge under standard spectral assumptions. These results justify using Kalman filtering in diffusion-map coordinates for predicting joint covariate-response evolution. We apply this methodology to equity-portfolio stress testing using macroeconomic and financial variables from Federal Reserve supervisory scenarios, achieving mean absolute error improvements of up to 55% over classical scenario analysis and 39% over principal component analysis benchmarks.

Paper Structure

This paper contains 44 sections, 5 theorems, 118 equations, 11 figures, 6 tables, 6 algorithms.

Key Result

Lemma 4.1

Suppose $\theta_0\sim \mu_0$ where $\mu_0$ admits a Radon--Nikodym derivative with respect to the invariant measure $\mu$ such that $\|\frac{\mathrm{d}\mu_0}{\mathrm{d}\mu}-1\|_{\mathbb{L}^2(\mu)}<\infty$, and let $\mu_t$ denote the distribution of $\theta_t$. If $f\in \mathbb{L}^2(\mu)$ then for $t

Figures (11)

  • Figure 1: Comparisons showing that $\ell=10$ diffusion coordinates accurately reconstruct the OU and CIR dynamics from Example \ref{['ex:example']}.
  • Figure 2: Eigenfunction recovery and path reconstructions for the simulation of Example \ref{['ex:example']}
  • Figure 3: A flow map of the Joint Diffusion Kalman Filter (JDKF) procedure. Starting from observations $z(t_i)$, we apply anisotropic diffusion maps to obtain embeddings $\psi(t_i)$ that capture the manifold geometry. We propagate a linear approximation of the diffusion coordinates forward in time. We predict coordinates using a Kalman filter and lift back to the measurement space using the linear operator $\mathbf{H}$.
  • Figure 4: Diagram for scenario analysis in stress testing. Input factors (macroeconomic variables, market data, portfolio composition) feed into scenario definitions, which can be historical (e.g., 2008 crisis, COVID-19), hypothetical (e.g., interest rate shock, oil price surge), or generated via Monte Carlo simulations. We then assess the portfolio change under stress and evaluate it through risk metrics such as Value-at-Risk (VaR), Expected Shortfall, stress losses, and regulatory capital requirements (Basel, CCAR).
  • Figure 5: Structure of the historical rolling backtest with periodic Kalman filter refitting. The timeline is divided into refitting periods and backtesting periods. During each refitting period, the Kalman filter is fitted once to obtain filtered estimates $\{\hat{\psi}_t\}$ and $\{\hat{z}_t\}$. Within each backtesting period, models are trained on rolling windows and predictions are made one period ahead. We show the full 200-month timeline spanning June 2004-December 2016 with three complete refitting cycles. The insets illustrate that after a burn-in period, the Kalman filter runs through the refitting period to stabilize estimates before rolling predictions begin.
  • ...and 6 more figures

Theorems & Definitions (17)

  • Remark 3.1
  • Remark 3.2
  • Example 3.3: OU and CIR Processes
  • Remark 3.4
  • Definition 3.5
  • Remark 3.6
  • Lemma 4.1
  • proof
  • Proposition 4.2: kipnis_central_1986
  • Theorem 4.3
  • ...and 7 more