Online High-Frequency Trading Stock Forecasting with Automated Feature Clustering and Radial Basis Function Neural Networks
Adamantios Ntakaris, Gbenga Ibikunle
TL;DR
The paper addresses online high-frequency trading stock forecasting by proposing a fully autonomous protocol that jointly automates feature importance and clustering. It combines a dual MDI and GD feature-importance mechanism with k-means guided clustering to configure an RBFNN regressor for mid-price prediction on Level-1 LOB data, with an online, tick-by-tick evaluation framework. Key contributions include integrating MDI and a GD-based feature importance as competitive signals, converting them into correlation-based distance matrices, and dynamically selecting the number of RBF centers via silhouette scores, all without manual grid search. The approach enables stock-specific input spaces and online adaptation, offering a path toward faster, autonomous HFT forecasting with reduced human intervention and improved responsiveness.
Abstract
This study presents an autonomous experimental machine learning protocol for high-frequency trading (HFT) stock price forecasting that involves a dual competitive feature importance mechanism and clustering via shallow neural network topology for fast training. By incorporating the k-means algorithm into the radial basis function neural network (RBFNN), the proposed method addresses the challenges of manual clustering and the reliance on potentially uninformative features. More specifically, our approach involves a dual competitive mechanism for feature importance, combining the mean-decrease impurity (MDI) method and a gradient descent (GD) based feature importance mechanism. This approach, tested on HFT Level 1 order book data for 20 S&P 500 stocks, enhances the forecasting ability of the RBFNN regressor. Our findings suggest that an autonomous approach to feature selection and clustering is crucial, as each stock requires a different input feature space. Overall, by automating the feature selection and clustering processes, we remove the need for manual topological grid search and provide a more efficient way to predict LOB's mid-price.
