Table of Contents
Fetching ...

Modeling Market States with Clustering and State Machines

Christian Oliva, Silviu Gabriel Tinjala

TL;DR

The paper addresses the challenge of modeling financial market states with interpretability and robustness by proposing a framework that clusters multi-horizon momentum and risk features to identify market regimes, then builds a probabilistic state machine from a transition matrix $M \in \mathbb{Z}_+^{K\times K}$. Returns are generated as a state-weighted Gaussian mixture $R \sim \sum_{i=1}^K c_i \mathcal{N}(\mu_i,\sigma_i)$ with weights $c_i$ derived from state frequencies, enabling capture of higher moments. Empirical results show the state-machine approach better matches skewness and kurtosis and yields lower distributional distances (KL, KS, Wasserstein) to real returns than a normal model, robustly across assets and time periods with optimal performance around $K\approx 10$. The framework offers an interpretable regime-aware tool for signal generation and risk management, with potential extensions to volume, macro indicators, and multi-asset regime-aware allocation.

Abstract

This work introduces a new framework for modeling financial markets through an interpretable probabilistic state machine. By clustering historical returns based on momentum and risk features across multiple time horizons, we identify distinct market states that capture underlying regimes, such as expansion phase, contraction, crisis, or recovery. From a transition matrix representing the dynamics between these states, we construct a probabilistic state machine that models the temporal evolution of the market. This state machine enables the generation of a custom distribution of returns based on a mixture of Gaussian components weighted by state frequencies. We show that the proposed benchmark significantly outperforms the traditional approach in capturing key statistical properties of asset returns, including skewness and kurtosis, and our experiments across random assets and time periods confirm its robustness.

Modeling Market States with Clustering and State Machines

TL;DR

The paper addresses the challenge of modeling financial market states with interpretability and robustness by proposing a framework that clusters multi-horizon momentum and risk features to identify market regimes, then builds a probabilistic state machine from a transition matrix . Returns are generated as a state-weighted Gaussian mixture with weights derived from state frequencies, enabling capture of higher moments. Empirical results show the state-machine approach better matches skewness and kurtosis and yields lower distributional distances (KL, KS, Wasserstein) to real returns than a normal model, robustly across assets and time periods with optimal performance around . The framework offers an interpretable regime-aware tool for signal generation and risk management, with potential extensions to volume, macro indicators, and multi-asset regime-aware allocation.

Abstract

This work introduces a new framework for modeling financial markets through an interpretable probabilistic state machine. By clustering historical returns based on momentum and risk features across multiple time horizons, we identify distinct market states that capture underlying regimes, such as expansion phase, contraction, crisis, or recovery. From a transition matrix representing the dynamics between these states, we construct a probabilistic state machine that models the temporal evolution of the market. This state machine enables the generation of a custom distribution of returns based on a mixture of Gaussian components weighted by state frequencies. We show that the proposed benchmark significantly outperforms the traditional approach in capturing key statistical properties of asset returns, including skewness and kurtosis, and our experiments across random assets and time periods confirm its robustness.

Paper Structure

This paper contains 7 sections, 5 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Log-volatility ($\log(\sigma_t)$) versus the mean of the variance-covariance matrix ($mean(\Sigma_t)$) at different rolling windows. From left to right, $t=5$, $t=20$, and $t=50$. Note that, in the three scenarios, the correlation between $\log(\sigma_t)$ and $mean(\Sigma_t)$ is higher than 0.83.
  • Figure 2: Cumulative returns and cluster regimes for S&P500 index from 2007 to 2022 (training data).
  • Figure 3: Box plot of momentum indicators of different states identified by the clustering. Each plot corresponds to a specific cluster: from top to bottom, expansion, contraction, crisis, flattening, and recovery.
  • Figure 4: Probability of being in each state
  • Figure 5: Probabilistic State Machine extracted from the transition matrix
  • ...and 3 more figures