Table of Contents
Fetching ...

HLOB -- Information Persistence and Structure in Limit Order Books

Antonio Briola, Silvia Bartolucci, Tomaso Aste

TL;DR

HLOB presents a novel approach to high-frequency LOB mid-price forecasting by combining a TMFG-based Information Filtering Network to reveal higher-order dependencies among volume levels with Homological Convolutional Neural Networks, augmented by an LSTM for temporal dynamics. The model is evaluated against nine state-of-the-art baselines on 15 NASDAQ stocks from 2017–2019, using three prediction horizons. Results show HLOB achieving strong performance, particularly at short horizons and for large-tick stocks, while MI-based spatial analyses provide insights into how information distribution across LOB levels affects predictability. The study highlights the value of incorporating topological priors into DL architectures for microstructure modeling and outlines avenues for refining topology and temporal evolution in future work.

Abstract

We introduce a novel large-scale deep learning model for Limit Order Book mid-price changes forecasting, and we name it `HLOB'. This architecture (i) exploits the information encoded by an Information Filtering Network, namely the Triangulated Maximally Filtered Graph, to unveil deeper and non-trivial dependency structures among volume levels; and (ii) guarantees deterministic design choices to handle the complexity of the underlying system by drawing inspiration from the groundbreaking class of Homological Convolutional Neural Networks. We test our model against 9 state-of-the-art deep learning alternatives on 3 real-world Limit Order Book datasets, each including 15 stocks traded on the NASDAQ exchange, and we systematically characterize the scenarios where HLOB outperforms state-of-the-art architectures. Our approach sheds new light on the spatial distribution of information in Limit Order Books and on its degradation over increasing prediction horizons, narrowing the gap between microstructural modeling and deep learning-based forecasting in high-frequency financial markets.

HLOB -- Information Persistence and Structure in Limit Order Books

TL;DR

HLOB presents a novel approach to high-frequency LOB mid-price forecasting by combining a TMFG-based Information Filtering Network to reveal higher-order dependencies among volume levels with Homological Convolutional Neural Networks, augmented by an LSTM for temporal dynamics. The model is evaluated against nine state-of-the-art baselines on 15 NASDAQ stocks from 2017–2019, using three prediction horizons. Results show HLOB achieving strong performance, particularly at short horizons and for large-tick stocks, while MI-based spatial analyses provide insights into how information distribution across LOB levels affects predictability. The study highlights the value of incorporating topological priors into DL architectures for microstructure modeling and outlines avenues for refining topology and temporal evolution in future work.

Abstract

We introduce a novel large-scale deep learning model for Limit Order Book mid-price changes forecasting, and we name it `HLOB'. This architecture (i) exploits the information encoded by an Information Filtering Network, namely the Triangulated Maximally Filtered Graph, to unveil deeper and non-trivial dependency structures among volume levels; and (ii) guarantees deterministic design choices to handle the complexity of the underlying system by drawing inspiration from the groundbreaking class of Homological Convolutional Neural Networks. We test our model against 9 state-of-the-art deep learning alternatives on 3 real-world Limit Order Book datasets, each including 15 stocks traded on the NASDAQ exchange, and we systematically characterize the scenarios where HLOB outperforms state-of-the-art architectures. Our approach sheds new light on the spatial distribution of information in Limit Order Books and on its degradation over increasing prediction horizons, narrowing the gap between microstructural modeling and deep learning-based forecasting in high-frequency financial markets.
Paper Structure (14 sections, 3 equations, 7 figures, 7 tables)

This paper contains 14 sections, 3 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Schematic representation of the TMFG's building process: (a) we start from a simplified version of the LOB containing only volumes data; (b) we mitigate the noise affecting the LOB by categorizing volumes into bins of uniform size; (c) we compute the pairwise MI between volume levels; and (d) we build the TMFG using the MI matrix as input. We remark that in the proposed graph representation, both nodes' and edges' color/dimension depend on their betweenness centrality. The color bar remains consistent for both the MI matrix and the corresponding TMFG representation.
  • Figure 2: This diagram illustrates the sequence of steps transitioning (a) from the output of the TMFG building process (b) to the input of the HLOB model. To construct the TMFG, we exclusively utilize volume levels from the LOB, forming a network characterized by three topological structures: tetrahedra, triangles, and edges. To prepare the inputs for the HLOB model, we perform two main tasks: (i) for each timestamp in the input's temporal dimension, we flatten each of the aforementioned sets; (ii) we incorporate the corresponding price levels' data into each representative of these three new input sets. Note that there is a direct mapping between the colours used in this Figure and the ones used later to highlight the inputs of the HLOB model in Figure \ref{['fig:HCNN_schema']}.
  • Figure 3: Visual overview of the HLOB model's operational framework. Note that there is a direct mapping between the colors used to denote the inputs of the HLOB model here and the colors used to represent the three categories of topological priors derived from a TMFG in Figure \ref{['fig:input_definition']}.
  • Figure 4: Distribution of $p_{\text{T}}$ (see the work by briola2024deep) as a function of the total number of executed round-trip transactions (TT) computed for each model in Table \ref{['tab:models_summary']} at $\text{H}\Delta_\tau \in \{10, 50, 100\}$.
  • Figure 5: Normalized (over the $15$ stocks in Table \ref{['tab:stocks_introduction']}) version of the average (computed across the $3$-year analysis period) MI matrices computed on small-tick stocks (i.e., CHTR, GOOG, GS, IBM, MCD, NVDA). For the sake of readability, we renamed LOB volume levels following a mapping schema that can be summarized as follows $v_\ell^{\text{ask}} \rightarrow \text{A}\ell$, $v_\ell^{\text{bid}} \rightarrow \text{B}\ell$.
  • ...and 2 more figures