Table of Contents
Fetching ...

Asset price movement prediction using empirical mode decomposition and Gaussian mixture models

Gabriel R. Palma, Mariusz Skoczeń, Phil Maguire

TL;DR

Predicting asset price movements in non-stationary financial time series is challenging. The paper proposes a unified framework that combines Empirical Mode Decomposition (EMD) and Gaussian Mixture Models (GMM) to extract regime-aware features from XRP, Tesla, and GameStop data, then trains multiple learners on 15-hour windows to forecast the next-hour decision using $\omega(t)=\log(y_{\text{close}}(t+1)/y_{\text{close}}(t))$ and evaluates performance with Accumulated Percentage Change ($APC$). Results show that EMD-enhanced features improve predictive performance, especially for ensemble methods like Random Forest and XGBoost, and that GMM filtering expands profitable configurations across markets. Overall, the framework demonstrates a scalable approach to regime-aware signal extraction with open-source code and a Python package to facilitate replication and extension.

Abstract

We investigated the use of Empirical Mode Decomposition (EMD) combined with Gaussian Mixture Models (GMM), feature engineering and machine learning algorithms to optimize trading decisions. We used five, two, and one year samples of hourly candle data for GameStop, Tesla, and XRP (Ripple) markets respectively. Applying a 15 hour rolling window for each market, we collected several features based on a linear model and other classical features to predict the next hour's movement. Subsequently, a GMM filtering approach was used to identify clusters among these markets. For each cluster, we applied the EMD algorithm to extract high, medium, low and trend components from each feature collected. A simple thresholding algorithm was applied to classify market movements based on the percentage change in each market's close price. We then evaluated the performance of various machine learning models, including Random Forests (RF) and XGBoost, in classifying market movements. A naive random selection of trading decisions was used as a benchmark, which assumed equal probabilities for each outcome, and a temporal cross-validation approach was used to test models on 40%, 30%, and 20% of the dataset. Our results indicate that transforming selected features using EMD improves performance, particularly for ensemble learning algorithms like Random Forest and XGBoost, as measured by accumulated profit. Finally, GMM filtering expanded the range of learning algorithm and data source combinations that outperformed the top percentile of the random baseline.

Asset price movement prediction using empirical mode decomposition and Gaussian mixture models

TL;DR

Predicting asset price movements in non-stationary financial time series is challenging. The paper proposes a unified framework that combines Empirical Mode Decomposition (EMD) and Gaussian Mixture Models (GMM) to extract regime-aware features from XRP, Tesla, and GameStop data, then trains multiple learners on 15-hour windows to forecast the next-hour decision using and evaluates performance with Accumulated Percentage Change (). Results show that EMD-enhanced features improve predictive performance, especially for ensemble methods like Random Forest and XGBoost, and that GMM filtering expands profitable configurations across markets. Overall, the framework demonstrates a scalable approach to regime-aware signal extraction with open-source code and a Python package to facilitate replication and extension.

Abstract

We investigated the use of Empirical Mode Decomposition (EMD) combined with Gaussian Mixture Models (GMM), feature engineering and machine learning algorithms to optimize trading decisions. We used five, two, and one year samples of hourly candle data for GameStop, Tesla, and XRP (Ripple) markets respectively. Applying a 15 hour rolling window for each market, we collected several features based on a linear model and other classical features to predict the next hour's movement. Subsequently, a GMM filtering approach was used to identify clusters among these markets. For each cluster, we applied the EMD algorithm to extract high, medium, low and trend components from each feature collected. A simple thresholding algorithm was applied to classify market movements based on the percentage change in each market's close price. We then evaluated the performance of various machine learning models, including Random Forests (RF) and XGBoost, in classifying market movements. A naive random selection of trading decisions was used as a benchmark, which assumed equal probabilities for each outcome, and a temporal cross-validation approach was used to test models on 40%, 30%, and 20% of the dataset. Our results indicate that transforming selected features using EMD improves performance, particularly for ensemble learning algorithms like Random Forest and XGBoost, as measured by accumulated profit. Finally, GMM filtering expanded the range of learning algorithm and data source combinations that outperformed the top percentile of the random baseline.

Paper Structure

This paper contains 6 sections, 4 equations, 5 figures, 1 algorithm.

Figures (5)

  • Figure 1: A diagram illustrating the complete proposed approach to predict market movements based on the features extracted from the selected series.
  • Figure 2: Accumulated profit ($APC$) obtained by the learning algorithms per studied market, when using the EMD components and the raw features. The red and black dashed lines represent the average, the $2.5\%$ and $97.5\%$ percentiles of the performance metric using the random algorithm.
  • Figure 3: Accumulated profit ($APC$) obtained by the learning algorithms per cluster presented in the XRP market, when using the EMD components and the raw features. The red and black dashed lines represent the average, the $2.5\%$ and $97.5\%$ percentiles of the performance metric using the random basline.
  • Figure 4: Accumulated profit ($APC$) obtained by the learning algorithms per cluster presented in the GameStop market, when using the EMD components and the raw features. The red and black dashed lines represent the average, the $2.5\%$ and $97.5\%$ percentiles of the performance metric using the random algorithm.
  • Figure 5: Accumulated profit ($APC$) obtained by the learning algorithms per cluster presented in the Tesla market, when using the EMD components and the raw features. The red and black dashed lines represent the average, the $2.5\%$ and $97.5\%$ percentiles of the performance metric using the random algorithm.