Assessing Uncertainty in Stock Returns: A Gaussian Mixture Distribution-Based Method
Yanlong Wang, Jian Xu, Shao-Lun Huang, Danny Dongning Sun, Xiao-Ping Zhang
TL;DR
The paper tackles uncertainty in stock returns by modeling the predictive distribution with a Gaussian mixture, enabling capture of skewness and heavy tails beyond traditional single-distribution approaches. It introduces MDNe, a fusion of Crossformer-based time-series processing and stock-code embedding that outputs a nine-component Gaussian mixture for each asset, trained via maximum likelihood. Empirical results on 3226 Chinese A-share stocks (2018–2022) show MDNe outperforms GARCH-family models across CRPS, MSE, and QLIKE, with MDN offering strengths at low-volatility regimes and MDNe delivering stronger performance during high-volatility periods; Diebold-Mariano tests corroborate these advantages. The work further provides a visualization framework through Bag-of-Words stock-code embeddings and t-SNE to reveal clusters of assets with similar risk profiles, aiding portfolio management and risk mitigation, and demonstrates robustness across multiple training runs. This combination of distributional forecasting and interpretable embedding-based visualization advances practical risk modeling in financial markets.
Abstract
This study seeks to advance the understanding and prediction of stock market return uncertainty through the application of advanced deep learning techniques. We introduce a novel deep learning model that utilizes a Gaussian mixture distribution to capture the complex, time-varying nature of asset return distributions in the Chinese stock market. By incorporating the Gaussian mixture distribution, our approach effectively characterizes short-term fluctuations and non-traditional features of stock returns, such as skewness and heavy tails, that are often overlooked by traditional models. Compared to GARCH models and their variants, our method demonstrates superior performance in volatility estimation, particularly during periods of heightened market volatility. It provides more accurate volatility forecasts and offers unique risk insights for different assets, thereby deepening the understanding of return uncertainty. Additionally, we propose a novel use of Code embedding which utilizes a bag-of-words approach to train hidden representations of stock codes and transforms the uncertainty attributes of stocks into high-dimensional vectors. These vectors are subsequently reduced to two dimensions, allowing the observation of similarity among different stocks. This visualization facilitates the identification of asset clusters with similar risk profiles, offering valuable insights for portfolio management and risk mitigation. Since we predict the uncertainty of returns by estimating their latent distribution, it is challenging to evaluate the return distribution when the true distribution is unobservable. However, we can measure it through the CRPS to assess how well the predicted distribution matches the true returns, and through MSE and QLIKE metrics to evaluate the error between the volatility level of the predicted distribution and proxy measures of true volatility.
