Table of Contents
Fetching ...

Online High-Frequency Trading Stock Forecasting with Automated Feature Clustering and Radial Basis Function Neural Networks

Adamantios Ntakaris, Gbenga Ibikunle

TL;DR

The paper addresses online high-frequency trading stock forecasting by proposing a fully autonomous protocol that jointly automates feature importance and clustering. It combines a dual MDI and GD feature-importance mechanism with k-means guided clustering to configure an RBFNN regressor for mid-price prediction on Level-1 LOB data, with an online, tick-by-tick evaluation framework. Key contributions include integrating MDI and a GD-based feature importance as competitive signals, converting them into correlation-based distance matrices, and dynamically selecting the number of RBF centers via silhouette scores, all without manual grid search. The approach enables stock-specific input spaces and online adaptation, offering a path toward faster, autonomous HFT forecasting with reduced human intervention and improved responsiveness.

Abstract

This study presents an autonomous experimental machine learning protocol for high-frequency trading (HFT) stock price forecasting that involves a dual competitive feature importance mechanism and clustering via shallow neural network topology for fast training. By incorporating the k-means algorithm into the radial basis function neural network (RBFNN), the proposed method addresses the challenges of manual clustering and the reliance on potentially uninformative features. More specifically, our approach involves a dual competitive mechanism for feature importance, combining the mean-decrease impurity (MDI) method and a gradient descent (GD) based feature importance mechanism. This approach, tested on HFT Level 1 order book data for 20 S&P 500 stocks, enhances the forecasting ability of the RBFNN regressor. Our findings suggest that an autonomous approach to feature selection and clustering is crucial, as each stock requires a different input feature space. Overall, by automating the feature selection and clustering processes, we remove the need for manual topological grid search and provide a more efficient way to predict LOB's mid-price.

Online High-Frequency Trading Stock Forecasting with Automated Feature Clustering and Radial Basis Function Neural Networks

TL;DR

The paper addresses online high-frequency trading stock forecasting by proposing a fully autonomous protocol that jointly automates feature importance and clustering. It combines a dual MDI and GD feature-importance mechanism with k-means guided clustering to configure an RBFNN regressor for mid-price prediction on Level-1 LOB data, with an online, tick-by-tick evaluation framework. Key contributions include integrating MDI and a GD-based feature importance as competitive signals, converting them into correlation-based distance matrices, and dynamically selecting the number of RBF centers via silhouette scores, all without manual grid search. The approach enables stock-specific input spaces and online adaptation, offering a path toward faster, autonomous HFT forecasting with reduced human intervention and improved responsiveness.

Abstract

This study presents an autonomous experimental machine learning protocol for high-frequency trading (HFT) stock price forecasting that involves a dual competitive feature importance mechanism and clustering via shallow neural network topology for fast training. By incorporating the k-means algorithm into the radial basis function neural network (RBFNN), the proposed method addresses the challenges of manual clustering and the reliance on potentially uninformative features. More specifically, our approach involves a dual competitive mechanism for feature importance, combining the mean-decrease impurity (MDI) method and a gradient descent (GD) based feature importance mechanism. This approach, tested on HFT Level 1 order book data for 20 S&P 500 stocks, enhances the forecasting ability of the RBFNN regressor. Our findings suggest that an autonomous approach to feature selection and clustering is crucial, as each stock requires a different input feature space. Overall, by automating the feature selection and clustering processes, we remove the need for manual topological grid search and provide a more efficient way to predict LOB's mid-price.

Paper Structure

This paper contains 8 sections, 17 equations, 1 figure, 7 tables, 2 algorithms.

Figures (1)

  • Figure 1: Overview of the fully automated protocol. From left to right: The first part of the online experimental protocol is the transformation of the LOB data to sliding window data blocks. Each of these data blocks are fed sequentially to the fully automated mechanism. Within that mechanism we have two competitive pipelines (i.e., MDI and GD) that will provide the feature importance vectors. Then the clustering block defines the optimal number of clusters based on the weighted by the feature importance vectors input matrix (i.e., correlation distance-based matrix). The clusters then determine the centroids and the standard deviation of the RBF neurons. The number of clusters is changing constantly (i.e., online) based on the latest input feature set.