Table of Contents
Fetching ...

Cryptocurrency Portfolio Management with Reinforcement Learning: Soft Actor--Critic and Deep Deterministic Policy Gradient Algorithms

Kamal Paykan

TL;DR

This work tackles adaptive cryptocurrency portfolio management under highly volatile, nonstationary conditions by deploying deep reinforcement learning agents. It compares two continuous-action DRL approaches—Soft Actor--Critic (SAC) and Deep Deterministic Policy Gradient (DDPG)—within a unified trading environment, leveraging an LSTM-based feature extractor to model temporal dependencies. Empirical results show SAC delivering superior risk-adjusted performance and stability against DDPG and a Markowitz mean--variance baseline, highlighting the benefits of entropy regularization in volatile markets. The findings suggest that data-driven, reinforcement-learning strategies can provide robust, adaptive portfolio management for digital assets, with future work aiming to incorporate multimodal data and advanced architectures for scalability and interpretability.

Abstract

This paper proposes a reinforcement learning--based framework for cryptocurrency portfolio management using the Soft Actor--Critic (SAC) and Deep Deterministic Policy Gradient (DDPG) algorithms. Traditional portfolio optimization methods often struggle to adapt to the highly volatile and nonlinear dynamics of cryptocurrency markets. To address this, we design an agent that learns continuous trading actions directly from historical market data through interaction with a simulated trading environment. The agent optimizes portfolio weights to maximize cumulative returns while minimizing downside risk and transaction costs. Experimental evaluations on multiple cryptocurrencies demonstrate that the SAC and DDPG agents outperform baseline strategies such as equal-weighted and mean--variance portfolios. The SAC algorithm, with its entropy-regularized objective, shows greater stability and robustness in noisy market conditions compared to DDPG. These results highlight the potential of deep reinforcement learning for adaptive and data-driven portfolio management in cryptocurrency markets.

Cryptocurrency Portfolio Management with Reinforcement Learning: Soft Actor--Critic and Deep Deterministic Policy Gradient Algorithms

TL;DR

This work tackles adaptive cryptocurrency portfolio management under highly volatile, nonstationary conditions by deploying deep reinforcement learning agents. It compares two continuous-action DRL approaches—Soft Actor--Critic (SAC) and Deep Deterministic Policy Gradient (DDPG)—within a unified trading environment, leveraging an LSTM-based feature extractor to model temporal dependencies. Empirical results show SAC delivering superior risk-adjusted performance and stability against DDPG and a Markowitz mean--variance baseline, highlighting the benefits of entropy regularization in volatile markets. The findings suggest that data-driven, reinforcement-learning strategies can provide robust, adaptive portfolio management for digital assets, with future work aiming to incorporate multimodal data and advanced architectures for scalability and interpretability.

Abstract

This paper proposes a reinforcement learning--based framework for cryptocurrency portfolio management using the Soft Actor--Critic (SAC) and Deep Deterministic Policy Gradient (DDPG) algorithms. Traditional portfolio optimization methods often struggle to adapt to the highly volatile and nonlinear dynamics of cryptocurrency markets. To address this, we design an agent that learns continuous trading actions directly from historical market data through interaction with a simulated trading environment. The agent optimizes portfolio weights to maximize cumulative returns while minimizing downside risk and transaction costs. Experimental evaluations on multiple cryptocurrencies demonstrate that the SAC and DDPG agents outperform baseline strategies such as equal-weighted and mean--variance portfolios. The SAC algorithm, with its entropy-regularized objective, shows greater stability and robustness in noisy market conditions compared to DDPG. These results highlight the potential of deep reinforcement learning for adaptive and data-driven portfolio management in cryptocurrency markets.

Paper Structure

This paper contains 34 sections, 53 equations, 8 figures, 4 tables, 2 algorithms.

Figures (8)

  • Figure 1: Data preprocessing workflow applied to OHLCV cryptocurrency data.
  • Figure 2: Temporal split of training and testing periods.
  • Figure 3: Structure of the feature extraction network. Each $\rho_{t-W+2:t}$ sequence is processed independently by an identical network producing latent features $v_t$.
  • Figure 4: Training and testing performance of the forecasting network. The green and purple regions denote the training and test periods, respectively. Red lines show true values; blue lines show predicted values.
  • Figure 5: Normalized prices (blue) and 20-day rolling Sharpe ratios (red) for BTC-USD, ETH-USD, LTC-USD, and DOGE-USD.
  • ...and 3 more figures