Reinforcement Learning for Portfolio Optimization with a Financial Goal and Defined Time Horizons

Fermat Leukam; Rock Stephane Koffi; Prudence Djagba

Reinforcement Learning for Portfolio Optimization with a Financial Goal and Defined Time Horizons

Fermat Leukam, Rock Stephane Koffi, Prudence Djagba

TL;DR

This work tackles goal-based portfolio optimization with a defined time horizon by leveraging G-learning, an entropy-regularized extension of Q-learning, to maximize wealth by a target date while minimizing periodic contributions. It introduces GIRL, an inverse reinforcement learning method, to infer reward-function parameters from observed trajectories, and compares direct G-learning with GIRL parameter learning. Empirical results in a high-volatility, diversified portfolio show a Sharpe ratio improvement to about $0.48$, with GIRL providing only marginal gains over the G-learning baseline, suggesting robustness of the approach. The study highlights the practical value of probabilistic RL in financial decision-making, offering a flexible framework that aligns asset management with investor-specific goals and time horizons.

Abstract

This research proposes an enhancement to the innovative portfolio optimization approach using the G-Learning algorithm, combined with parametric optimization via the GIRL algorithm (G-learning approach to the setting of Inverse Reinforcement Learning) as presented by. The goal is to maximize portfolio value by a target date while minimizing the investor's periodic contributions. Our model operates in a highly volatile market with a well-diversified portfolio, ensuring a low-risk level for the investor, and leverages reinforcement learning to dynamically adjust portfolio positions over time. Results show that we improved the Sharpe Ratio from 0.42, as suggested by recent studies using the same approach, to a value of 0.483 a notable achievement in highly volatile markets with diversified portfolios. The comparison between G-Learning and GIRL reveals that while GIRL optimizes the reward function parameters (e.g., lambda = 0.0012 compared to 0.002), its impact on portfolio performance remains marginal. This suggests that reinforcement learning methods, like G-Learning, already enable robust optimization. This research contributes to the growing development of reinforcement learning applications in financial decision-making, demonstrating that probabilistic learning algorithms can effectively align portfolio management strategies with investor needs.

Reinforcement Learning for Portfolio Optimization with a Financial Goal and Defined Time Horizons

TL;DR

Abstract

Reinforcement Learning for Portfolio Optimization with a Financial Goal and Defined Time Horizons

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)