Minimal Batch Adaptive Learning Policy Engine for Real-Time Mid-Price Forecasting in High-Frequency Trading

Adamantios Ntakaris; Gbenga Ibikunle

Minimal Batch Adaptive Learning Policy Engine for Real-Time Mid-Price Forecasting in High-Frequency Trading

Adamantios Ntakaris, Gbenga Ibikunle

TL;DR

The paper tackles real-time mid-price forecasting in high-frequency trading by leveraging NASDAQ Level 1 LOB data across 100 stocks. It introduces ALPE, a batch-free, online RL agent that uses an MLP policy-value network and adaptive epsilon decay to directly predict mid-price adjustments from the current LOB state. Two feature spaces (Simple and Extended) and three input variants (Raw, MDI, GD) are evaluated against baselines (Naive, ARIMA, MLP, CNN, LSTM, GRU, RBFNN) using MSE, RMSE, and the proposed Relative RMSE (RRMSE), with ALPE consistently achieving superior accuracy. The work demonstrates ALPE’s robustness across trading volumes and suggests future directions, including multi-agent RL and Level 2 LOB processing, to broaden applicability in real-time market environments, with RRMSE providing a normalized performance lens.

Abstract

High-frequency trading (HFT) has transformed modern financial markets, making reliable short-term price forecasting models essential. In this study, we present a novel approach to mid-price forecasting using Level 1 limit order book (LOB) data from NASDAQ, focusing on 100 U.S. stocks from the S&P 500 index during the period from September to November 2022. Expanding on our previous work with Radial Basis Function Neural Networks (RBFNN), which leveraged automated feature importance techniques based on mean decrease impurity (MDI) and gradient descent (GD), we introduce the Adaptive Learning Policy Engine (ALPE) - a reinforcement learning (RL)-based agent designed for batch-free, immediate mid-price forecasting. ALPE incorporates adaptive epsilon decay to dynamically balance exploration and exploitation, outperforming a diverse range of highly effective machine learning (ML) and deep learning (DL) models in forecasting performance.

Minimal Batch Adaptive Learning Policy Engine for Real-Time Mid-Price Forecasting in High-Frequency Trading

TL;DR

Abstract

Paper Structure (12 sections, 4 equations, 4 figures, 16 tables)

This paper contains 12 sections, 4 equations, 4 figures, 16 tables.

Introduction
Literature Review
Methodology and Experimental Setup
Forecasting Objective
Data Preprocessing and Feature Engineering
Reinforcement Learning – Deep Policy Value Learning
Markov Decision Process Representation
Exploitation Network Architecture
Policy Value Approximation with Minimal Training
Novelty of the Proposed RL Framework
Results and Discussion
Conclusion

Figures (4)

Figure 1: The figure provides an overview of our experimental protocol, which is structured around two main blocks: Simple and Extended feature sets, The process begins with the Input block, where the HFT trader utilizes each input block consisting of a specific number of LOB states, referred to as Overlapping Feature Block Inputs. These blocks sequentially feed raw LOB data into the two feature sets. The Simple feature set is based on the best level (i.e., price level 1) of LOB data, while the Extended feature set consists of handcrafted kernelized features designed to capture more complex relationships from the data. For both the Simple and Extended sets, three different inputs are used: Raw data, MDI-adjusted features, and GD-adjusted features. Each input type is passed through various regressors (i.e., baseline regressor, ARIMA, MLP, CNN, LSTM, GRU, RBFNN, and ALPE), and their performance is evaluated using metrics such as MSE, RMSE, and RRMSE. This framework is applied to each of the 100 stocks, ensuring a thorough evaluation of the different feature sets and models.
Figure 2: RRMSE scores for all six input datasets (i.e., Simple, Simple MDI, Simple GD, Exte, Exte MDI, Exte GD) and each model (i.e., ARIMA, Naive regressor, MLP, CNN, LSTM, GRU, RBFNN, and ALPE) for all stocks, except for WBD due to overscaling of the y-axis, are reported. The performance details for WBD can be found in \ref{['tab:table_13']} for performance details of that stock.
Figure 3: Percentage error reduction between RMSE and RRMSE across stocks, plotted against total volume. The majority of stocks are concentrated in the top left quadrant, indicating a significant error reduction for lower-volume stocks, which supports the use of RRMSE over RMSE. Note: A subset of stock names is annotated to maintain clarity in the scatter plot.
Figure 4: Performance-based volume profiling: The graph shows the volume distribution per stock, classified according to their best ALPE-based RRMSE across the six datasets. Note: The following stocks are not fully visible due to image scaling: VRSN, ZBRA, SBAC, POOL, and ORLY for Simple; ILMN, TECH, and REGN for Exte GD; ODFL and EQIX for Exte MDI.

Minimal Batch Adaptive Learning Policy Engine for Real-Time Mid-Price Forecasting in High-Frequency Trading

TL;DR

Abstract

Minimal Batch Adaptive Learning Policy Engine for Real-Time Mid-Price Forecasting in High-Frequency Trading

Authors

TL;DR

Abstract

Table of Contents

Figures (4)