Combining supervised and unsupervised learning methods to predict financial market movements

Gabriel Rodrigues Palma; Mariusz Skoczeń; Phil Maguire

Combining supervised and unsupervised learning methods to predict financial market movements

Gabriel Rodrigues Palma, Mariusz Skoczeń, Phil Maguire

TL;DR

This work targets forecasting financial market movements by fusing supervised and unsupervised learning through novel feature engineering based on price peaks and their curvature, complemented by Gaussian Mixture Model filtering to capture regime structure. The authors evaluate a suite of classifiers (KNN, RF, DNN, Poly SVM, XGBoost) on six months of minute-level data from Bitcoin, Pepecoin, and Nasdaq, using a threshold-based labeling to produce buy/sell/hold signals and temporal cross-validation. The results show that GMM-filtered streams with the proposed features can improve generalization and yield higher profitability for certain markets, especially Pepecoin with RF/KNN, highlighting the value of regime-aware feature engineering in financial forecasting. Overall, the study demonstrates the potential of combining linear-model-derived features with unsupervised clustering to inform trading decisions in multi-market time series.

Abstract

The decisions traders make to buy or sell an asset depend on various analyses, with expertise required to identify patterns that can be exploited for profit. In this paper we identify novel features extracted from emergent and well-established financial markets using linear models and Gaussian Mixture Models (GMM) with the aim of finding profitable opportunities. We used approximately six months of data consisting of minute candles from the Bitcoin, Pepecoin, and Nasdaq markets to derive and compare the proposed novel features with commonly used ones. These features were extracted based on the previous 59 minutes for each market and used to identify predictions for the hour ahead. We explored the performance of various machine learning strategies, such as Random Forests (RF) and K-Nearest Neighbours (KNN) to classify market movements. A naive random approach to selecting trading decisions was used as a benchmark, with outcomes assumed to be equally likely. We used a temporal cross-validation approach using test sets of 40%, 30% and 20% of total hours to evaluate the learning algorithms' performances. Our results showed that filtering the time series facilitates algorithms' generalisation. The GMM filtering approach revealed that the KNN and RF algorithms produced higher average returns than the random algorithm.

Combining supervised and unsupervised learning methods to predict financial market movements

TL;DR

Abstract

Paper Structure (6 sections, 1 equation, 6 figures, 2 tables)

This paper contains 6 sections, 1 equation, 6 figures, 2 tables.

Introduction
Methods
Results and discussion
Conclusion
Acknowledgments
Declarations

Figures (6)

Figure 1: Illustration of the trading decisions obtained from a simple symmetric threshold algorithm based on the $4\%$ quantile of the $\mathbf{\Omega}_m$.
Figure 2: Gaussian mixture model filtering approach applied to the Bitcoin market using the proposed features. The diagram shows the obtained time series clustered into $4$ groups.
Figure 3: The mean of individual accuracies of the ML algorithms per studied market, when using the original and standardised data. The red and black dashed lines represent the average, the $2.5\%$ and $97.5\%$ percentiles of the performance metric using the random algorithm.
Figure 4: The mean of individual accuracies of the ML algorithms per GMM filtered time series based on the GMM clusters estimated with the proposed features, when using the original and standardised data. The red and black dashed lines represent the average, the $2.5\%$ and $97.5\%$ percentiles of the performance metric using the random algorithm.
Figure 5: The Accumulated percentage of change of the ML algorithms per studied market, when using the original and standardised data. The red and black dashed lines represent the average, the $2.5\%$ and $97.5\%$ percentiles of the performance metric using the random algorithm.
...and 1 more figures

Combining supervised and unsupervised learning methods to predict financial market movements

TL;DR

Abstract

Combining supervised and unsupervised learning methods to predict financial market movements

Authors

TL;DR

Abstract

Table of Contents

Figures (6)