Table of Contents
Fetching ...

Label Unbalance in High-frequency Trading

Zijian Zhao, Xuming Zhang, Jiayu Wen, Mingwen Liu, Xiaoteng Ma

TL;DR

The work tackles label imbalance in high-frequency trading return prediction and proposes an end-to-end deep learning framework that adjusts for imbalanced labels. It evaluates three backbone models (MLP, LSTM, Mamba) on 20 days of 0.5-second futures data with 13 features and implements under-sampling and cost-sensitive losses to bolster minority-class performance. Results show that LSTM and Mamba often outperform MLP, with imbalance-aware losses (e.g., sensitive loss, weighted loss) providing robust improvements while resampling and focal losses have mixed effects. The study demonstrates credible 1-minute return prediction in the Chinese futures market and provides a public codebase for reproducibility and further exploration.

Abstract

In financial trading, return prediction is one of the foundation for a successful trading system. By the fast development of the deep learning in various areas such as graphical processing, natural language, it has also demonstrate significant edge in handling with financial data. While the success of the deep learning relies on huge amount of labeled sample, labeling each time/event as profitable or unprofitable, under the transaction cost, especially in the high-frequency trading world, suffers from serious label imbalance issue.In this paper, we adopts rigurious end-to-end deep learning framework with comprehensive label imbalance adjustment methods and succeed in predicting in high-frequency return in the Chinese future market. The code for our method is publicly available at https://github.com/RS2002/Label-Unbalance-in-High-Frequency-Trading .

Label Unbalance in High-frequency Trading

TL;DR

The work tackles label imbalance in high-frequency trading return prediction and proposes an end-to-end deep learning framework that adjusts for imbalanced labels. It evaluates three backbone models (MLP, LSTM, Mamba) on 20 days of 0.5-second futures data with 13 features and implements under-sampling and cost-sensitive losses to bolster minority-class performance. Results show that LSTM and Mamba often outperform MLP, with imbalance-aware losses (e.g., sensitive loss, weighted loss) providing robust improvements while resampling and focal losses have mixed effects. The study demonstrates credible 1-minute return prediction in the Chinese futures market and provides a public codebase for reproducibility and further exploration.

Abstract

In financial trading, return prediction is one of the foundation for a successful trading system. By the fast development of the deep learning in various areas such as graphical processing, natural language, it has also demonstrate significant edge in handling with financial data. While the success of the deep learning relies on huge amount of labeled sample, labeling each time/event as profitable or unprofitable, under the transaction cost, especially in the high-frequency trading world, suffers from serious label imbalance issue.In this paper, we adopts rigurious end-to-end deep learning framework with comprehensive label imbalance adjustment methods and succeed in predicting in high-frequency return in the Chinese future market. The code for our method is publicly available at https://github.com/RS2002/Label-Unbalance-in-High-Frequency-Trading .

Paper Structure

This paper contains 27 sections, 14 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Sample Cost Matrix
  • Figure 2: Galar et al.’s proposed taxonomy for ensembles to address the imbalanced data classification problem
  • Figure 3: Workflow of Proposed Method
  • Figure 4: Enter Caption
  • Figure 5: Long Short Term Memory (LSTM) cell
  • ...and 2 more figures