A Modified CTGAN-Plus-Features Based Method for Optimal Asset Allocation
José-Manuel Peña, Fernando Suárez, Omar Larré, Domingo Ramírez, Arturo Cifuentes
TL;DR
This work tackles robust asset allocation under tail risk by integrating CVaR-constrained optimization with regime-aware synthetic data generation. It introduces a Modified CTGAN that ingests contextual yield-curve features to produce realistic synthetic returns, enabling more representative scenario sets than historical data alone. The optimization is reformulated into a discretized linear program, using a loss threshold $\zeta$ and auxiliary variables $\mathbf{z}$ to enforce the CVaR constraint while maintaining a budget of 1 and nonnegative weights, with the objective expressed as $\max \mathbf{x}^\top \mathbf{R} \bm{\pi}$. Feature-based weighting via $\bm{\pi}_{\bm{f}}$ allows alignment with the current economic state. Empirical results over ten asset classes and ~14.5 years show improved out-of-sample performance relative to equal-weighting and data-only formulations, highlighting the method's practical impact for long-horizon, index-based portfolios.
Abstract
We propose a new approach to portfolio optimization that utilizes a unique combination of synthetic data generation and a CVaR-constraint. We formulate the portfolio optimization problem as an asset allocation problem in which each asset class is accessed through a passive (index) fund. The asset-class weights are determined by solving an optimization problem which includes a CVaR-constraint. The optimization is carried out by means of a Modified CTGAN algorithm which incorporates features (contextual information) and is used to generate synthetic return scenarios, which, in turn, are fed into the optimization engine. For contextual information we rely on several points along the U.S. Treasury yield curve. The merits of this approach are demonstrated with an example based on ten asset classes (covering stocks, bonds, and commodities) over a fourteen-and-half year period (January 2008-June 2022). We also show that the synthetic generation process is able to capture well the key characteristics of the original data, and the optimization scheme results in portfolios that exhibit satisfactory out-of-sample performance. We also show that this approach outperforms the conventional equal-weights (1/N) asset allocation strategy and other optimization formulations based on historical data only.
