Table of Contents
Fetching ...

Diffusion Factor Models: Generating High-Dimensional Returns with Factor Structure

Minshuo Chen, Renyuan Xu, Yumin Xu, Ruixun Zhang

TL;DR

This work introduces the diffusion factor model, a diffusion-based generator that exploits latent factor structure in high-dimensional asset returns to address the curse of dimensionality and small data. It derives a time-varying score decomposition into a low-dimensional subspace component and a linear complement, guiding a factor-aware encoder–decoder score network. The authors prove nonasymptotic error bounds for score estimation and distribution recovery that scale with the latent dimension $k$ rather than the ambient dimension $d$, and demonstrate latent subspace recovery via generated samples. Numerical experiments on synthetic data show improved latent-space recovery and smoother generated return distributions, while empirical analysis on US stock data shows diffusion-generated data enhances mean–variance portfolios and factor-tangency portfolios. Together, these results establish a principled framework for high-dimensional, data-scarce financial simulation with theoretical guarantees and practical portfolio applications.

Abstract

Financial scenario simulation is essential for risk management and portfolio optimization, yet it remains challenging especially in high-dimensional and small data settings common in finance. We propose a diffusion factor model that integrates latent factor structure into generative diffusion processes, bridging econometrics with modern generative AI to address the challenges of the curse of dimensionality and data scarcity in financial simulation. By exploiting the low-dimensional factor structure inherent in asset returns, we decompose the score function--a key component in diffusion models--using time-varying orthogonal projections, and this decomposition is incorporated into the design of neural network architectures. We derive rigorous statistical guarantees, establishing nonasymptotic error bounds for both score estimation at O(d^{5/2} n^{-2/(k+5)}) and generated distribution at O(d^{5/4} n^{-1/2(k+5)}), primarily driven by the intrinsic factor dimension k rather than the number of assets d, surpassing the dimension-dependent limits in the classical nonparametric statistics literature and making the framework viable for markets with thousands of assets. Numerical studies confirm superior performance in latent subspace recovery under small data regimes. Empirical analysis demonstrates the economic significance of our framework in constructing mean-variance optimal portfolios and factor portfolios. This work presents the first theoretical integration of factor structure with diffusion models, offering a principled approach for high-dimensional financial simulation with limited data. Our code is available at https://github.com/xymmmm00/diffusion_factor_model.

Diffusion Factor Models: Generating High-Dimensional Returns with Factor Structure

TL;DR

This work introduces the diffusion factor model, a diffusion-based generator that exploits latent factor structure in high-dimensional asset returns to address the curse of dimensionality and small data. It derives a time-varying score decomposition into a low-dimensional subspace component and a linear complement, guiding a factor-aware encoder–decoder score network. The authors prove nonasymptotic error bounds for score estimation and distribution recovery that scale with the latent dimension rather than the ambient dimension , and demonstrate latent subspace recovery via generated samples. Numerical experiments on synthetic data show improved latent-space recovery and smoother generated return distributions, while empirical analysis on US stock data shows diffusion-generated data enhances mean–variance portfolios and factor-tangency portfolios. Together, these results establish a principled framework for high-dimensional, data-scarce financial simulation with theoretical guarantees and practical portfolio applications.

Abstract

Financial scenario simulation is essential for risk management and portfolio optimization, yet it remains challenging especially in high-dimensional and small data settings common in finance. We propose a diffusion factor model that integrates latent factor structure into generative diffusion processes, bridging econometrics with modern generative AI to address the challenges of the curse of dimensionality and data scarcity in financial simulation. By exploiting the low-dimensional factor structure inherent in asset returns, we decompose the score function--a key component in diffusion models--using time-varying orthogonal projections, and this decomposition is incorporated into the design of neural network architectures. We derive rigorous statistical guarantees, establishing nonasymptotic error bounds for both score estimation at O(d^{5/2} n^{-2/(k+5)}) and generated distribution at O(d^{5/4} n^{-1/2(k+5)}), primarily driven by the intrinsic factor dimension k rather than the number of assets d, surpassing the dimension-dependent limits in the classical nonparametric statistics literature and making the framework viable for markets with thousands of assets. Numerical studies confirm superior performance in latent subspace recovery under small data regimes. Empirical analysis demonstrates the economic significance of our framework in constructing mean-variance optimal portfolios and factor portfolios. This work presents the first theoretical integration of factor structure with diffusion models, offering a principled approach for high-dimensional financial simulation with limited data. Our code is available at https://github.com/xymmmm00/diffusion_factor_model.

Paper Structure

This paper contains 75 sections, 15 theorems, 182 equations, 8 figures, 11 tables, 1 algorithm.

Key Result

Lemma 1

Suppose Assumption assumption: factor holds. The score function $\nabla \log p_t(\mathbf r)$ can be decomposed into a subspace score and a complement score as where $p_t^{\rm{fac}}(\cdot) := \int \phi(\cdot; \alpha_t \mathbf f, \bm\Gamma_t) p_{\rm{fac}}(\mathbf f) {\rm d} \mathbf f$ and $\bm\Lambda_t$, $\bm\Gamma_t$, $\mathbf T_{t}$ are defined in equ: Lambda_t and equ: Projection_t.

Figures (8)

  • Figure 1: Examples of asset return distribution (the blue is constructed using output samples from the diffusion model and the green is based on samples from the ground truth.)
  • Figure 2: Cumulative returns of different portfolios in log scale with transaction cost for $\eta=3$.
  • Figure 3: Correlation between the top 8 factors obtained using diffusion-based methods and those from the FF Method.
  • Figure E.1: Examples of asset return distribution (the blue histogram is constructed using samples generated from the diffusion model and the green one uses actual data samples.)
  • Figure E.2: Cumulative returns of different portfolios in log scale with transaction cost for $\eta=5$ (model updated quarterly).
  • ...and 3 more figures

Theorems & Definitions (31)

  • Lemma 1
  • Example 1: Gaussian factors
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Lemma 2
  • Lemma 3
  • proof
  • proof
  • proof
  • ...and 21 more