Table of Contents
Fetching ...

Learn to Rank Risky Investors: A Case Study of Predicting Retail Traders' Behaviour and Profitability

Weixian Waylon Li, Tiejun Ma

TL;DR

This work reframes risky trader detection in CFD markets as a learning-to-rank problem, introducing PA-RiskRanker that optimizes Profit-Aware BCE loss and employs a Self-Cross-Trader Attention pipeline to capture intra- and inter-trader dynamics. By embedding financial profit signals directly into the ranking objective and modeling complex feature interactions, the approach achieves notable gains in both ranking metrics and monetary impact, surpassing state-of-the-art ranking models and traditional classification baselines. The study also adopts a two-step evaluation framework to bridge ranking signals with interpretable classifiers, demonstrating practical gains in predictive performance and decision transparency. The results suggest substantial practical value for market makers in real-time hedging decisions and highlight promising avenues for extending the framework to related financial risk domains and multimodal data.

Abstract

Identifying risky traders with high profits in financial markets is crucial for market makers, such as trading exchanges, to ensure effective risk management through real-time decisions on regulation compliance and hedging. However, capturing the complex and dynamic behaviours of individual traders poses significant challenges. Traditional classification and anomaly detection methods often establish a fixed risk boundary, failing to account for this complexity and dynamism. To tackle this issue, we propose a profit-aware risk ranker (PA-RiskRanker) that reframes the problem of identifying risky traders as a ranking task using Learning-to-Rank (LETOR) algorithms. Our approach features a Profit-Aware binary cross entropy (PA-BCE) loss function and a transformer-based ranker enhanced with a self-cross-trader attention pipeline. These components effectively integrate profit and loss (P&L) considerations into the training process while capturing intra- and inter-trader relationships. Our research critically examines the limitations of existing deep learning-based LETOR algorithms in trading risk management, which often overlook the importance of P&L in financial scenarios. By prioritising P&L, our method improves risky trader identification, achieving an 8.4% increase in F1 score compared to state-of-the-art (SOTA) ranking models like Rankformer. Additionally, it demonstrates a 10%-17% increase in average profit compared to all benchmark models.

Learn to Rank Risky Investors: A Case Study of Predicting Retail Traders' Behaviour and Profitability

TL;DR

This work reframes risky trader detection in CFD markets as a learning-to-rank problem, introducing PA-RiskRanker that optimizes Profit-Aware BCE loss and employs a Self-Cross-Trader Attention pipeline to capture intra- and inter-trader dynamics. By embedding financial profit signals directly into the ranking objective and modeling complex feature interactions, the approach achieves notable gains in both ranking metrics and monetary impact, surpassing state-of-the-art ranking models and traditional classification baselines. The study also adopts a two-step evaluation framework to bridge ranking signals with interpretable classifiers, demonstrating practical gains in predictive performance and decision transparency. The results suggest substantial practical value for market makers in real-time hedging decisions and highlight promising avenues for extending the framework to related financial risk domains and multimodal data.

Abstract

Identifying risky traders with high profits in financial markets is crucial for market makers, such as trading exchanges, to ensure effective risk management through real-time decisions on regulation compliance and hedging. However, capturing the complex and dynamic behaviours of individual traders poses significant challenges. Traditional classification and anomaly detection methods often establish a fixed risk boundary, failing to account for this complexity and dynamism. To tackle this issue, we propose a profit-aware risk ranker (PA-RiskRanker) that reframes the problem of identifying risky traders as a ranking task using Learning-to-Rank (LETOR) algorithms. Our approach features a Profit-Aware binary cross entropy (PA-BCE) loss function and a transformer-based ranker enhanced with a self-cross-trader attention pipeline. These components effectively integrate profit and loss (P&L) considerations into the training process while capturing intra- and inter-trader relationships. Our research critically examines the limitations of existing deep learning-based LETOR algorithms in trading risk management, which often overlook the importance of P&L in financial scenarios. By prioritising P&L, our method improves risky trader identification, achieving an 8.4% increase in F1 score compared to state-of-the-art (SOTA) ranking models like Rankformer. Additionally, it demonstrates a 10%-17% increase in average profit compared to all benchmark models.

Paper Structure

This paper contains 45 sections, 2 theorems, 18 equations, 6 figures, 9 tables, 1 algorithm.

Key Result

Proposition 1

Pairwise ranking approaches inherently produce a balanced class distribution by creating equal numbers of positive and negative labels for each pairwise comparison when each trader is associated with distinct future profits.

Figures (6)

  • Figure 1: Comparison of classification, anomaly detection, and ranking approaches. Classification and anomaly detection rely on fixed decision boundaries, while ranking models use relative risk scores, better aligning with the dynamic market conditions.
  • Figure 2: Architecture of the PA-RiskRanker pipeline. Continuous features are processed through a linear layer, while categorical features are encoded using FT-Embedding. The [CLS] token in the self-trader attention mechanism captures feature dependencies, generating contextual embeddings. Cross-trader attention then analyses trader interrelationships to detect risky patterns, forming the $\mathbf{G}_{score}$ matrix. The pipeline utilises the NextTotalPL_GBP information for $\mathbf{G}_{P\&L}$ matrix construction and is trained with PA-BCE loss for precise ranking alignment.
  • Figure 3: Proportion of correctly identified risky traders segmented by future profit ranges: £50K–£150K, £150K–£300K, and above £300K, across selected benchmark models.
  • Figure 4: Feature importance of the second step XGBoost Classifier with $10^3$ trees. "fst_step_scores" stands for the predictive scores generated from the first step PA-RiskRanker.
  • Figure 5: Feature importance of the second step LightGBM Classifier with $10^3$ trees. "fst_step_scores" stands for the predictive scores generated from the first step PA-RiskRanker.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Proposition 2