Revisiting Ensemble Methods for Stock Trading and Crypto Trading Tasks at ACM ICAIF FinRL Contest 2023-2024

Nikolaus Holzer; Keyi Wang; Kairong Xiao; Xiao-Yang Liu Yanglet

Revisiting Ensemble Methods for Stock Trading and Crypto Trading Tasks at ACM ICAIF FinRL Contest 2023-2024

Nikolaus Holzer, Keyi Wang, Kairong Xiao, Xiao-Yang Liu Yanglet

TL;DR

The paper tackles policy instability and sampling bottlenecks in financial reinforcement learning by combining ensemble methods with massively parallel GPU simulations. It introduces vectorized market environments and GPU-resident data storage to achieve up to $1{,}746\times$ sampling speedups on a single GPU and demonstrates that ensembles can yield robust, risk-aware performance in both stock and cryptocurrency trading. Key contributions include KL-divergence-driven agent diversity, two tailored ensemble designs (weighted-action for stocks and majority-vote for crypto), and comprehensive evaluations showing improved risk metrics such as reduced maximum drawdown and enhanced Sharpe ratios. The work has practical implications for scalable, robust FinRL training and proposes future directions like zero-knowledge proof validation for model provenance in trading contexts.

Abstract

Reinforcement learning has demonstrated great potential for performing financial tasks. However, it faces two major challenges: policy instability and sampling bottlenecks. In this paper, we revisit ensemble methods with massively parallel simulations on graphics processing units (GPUs), significantly enhancing the computational efficiency and robustness of trained models in volatile financial markets. Our approach leverages the parallel processing capability of GPUs to significantly improve the sampling speed for training ensemble models. The ensemble models combine the strengths of component agents to improve the robustness of financial decision-making strategies. We conduct experiments in both stock and cryptocurrency trading tasks to evaluate the effectiveness of our approach. Massively parallel simulation on a single GPU improves the sampling speed by up to $1,746\times$ using $2,048$ parallel environments compared to a single environment. The ensemble models have high cumulative returns and outperform some individual agents, reducing maximum drawdown by up to $4.17\%$ and improving the Sharpe ratio by up to $0.21$. This paper describes trading tasks at ACM ICAIF FinRL Contests in 2023 and 2024.

Revisiting Ensemble Methods for Stock Trading and Crypto Trading Tasks at ACM ICAIF FinRL Contest 2023-2024

TL;DR

sampling speedups on a single GPU and demonstrates that ensembles can yield robust, risk-aware performance in both stock and cryptocurrency trading. Key contributions include KL-divergence-driven agent diversity, two tailored ensemble designs (weighted-action for stocks and majority-vote for crypto), and comprehensive evaluations showing improved risk metrics such as reduced maximum drawdown and enhanced Sharpe ratios. The work has practical implications for scalable, robust FinRL training and proposes future directions like zero-knowledge proof validation for model provenance in trading contexts.

Abstract

using

parallel environments compared to a single environment. The ensemble models have high cumulative returns and outperform some individual agents, reducing maximum drawdown by up to

and improving the Sharpe ratio by up to

. This paper describes trading tasks at ACM ICAIF FinRL Contests in 2023 and 2024.

Paper Structure (31 sections, 6 equations, 5 figures, 3 tables)

This paper contains 31 sections, 6 equations, 5 figures, 3 tables.

Introduction
Related Works
Financial Reinforcement Learning
Ensemble Learning
Simulation Environments
Problem Description
Problem Formulation for FinRL Tasks
Training Process
Effectiveness and Burdens of Ensemble Methods
Effectiveness and Costs of Ensemble Methods
Challenge of Extensive Sampling
Massively Parallel Simulation
Simulation Phase for Gradient Estimate
Massively Parallel Market Environments
Parallelsim of Simulation Phase
...and 16 more sections

Figures (5)

Figure 1: Performance deviation for different RL algorithms and a simple ensemble method.
Figure 2: Producer-Consumer model for RL.
Figure 3: Ensemble methods.
Figure 4: Samples per second for the stock trading task and the cryptocurrency trading task. NVIDIA A100 GPU is used.
Figure 5: Cumulative returns of different strategies for the stock trading task and cryptocurrency trading task.

Revisiting Ensemble Methods for Stock Trading and Crypto Trading Tasks at ACM ICAIF FinRL Contest 2023-2024

TL;DR

Abstract

Revisiting Ensemble Methods for Stock Trading and Crypto Trading Tasks at ACM ICAIF FinRL Contest 2023-2024

Authors

TL;DR

Abstract

Table of Contents

Figures (5)