Language Model Guided Reinforcement Learning in Quantitative Trading
Adam Darmanin, Vince Vella
TL;DR
The paper tackles the challenge of myopic and opaque reinforcement learning policies in algorithmic trading by introducing a hybrid framework where large language models provide high-level, economically grounded trading strategies to guide RL agents. A modular design adds an uncertainty-weighted scalar to the RL observation, enabling LLM-driven signals to influence decisions without altering the underlying RL algorithm. Through two experiments, the authors demonstrate that well-engineered prompts can substantially boost risk-adjusted returns and confidence (via $SR$, $PPL$, and $H_{LLM}$), and that an LLM-guided RL agent can outperform an RL-only baseline on multiple assets, though performance gains for downside risk ($MDD$) are more variable. The work supports modular, agentic planning in finance, suggesting future gains from reward shaping and mixture-of-experts architectures to further stabilize and enhance performance in real-time trading settings.
Abstract
Algorithmic trading requires short-term tactical decisions consistent with long-term financial objectives. Reinforcement Learning (RL) has been applied to such problems, but adoption is limited by myopic behaviour and opaque policies. Large Language Models (LLMs) offer complementary strategic reasoning and multi-modal signal interpretation when guided by well-structured prompts. This paper proposes a hybrid framework in which LLMs generate high-level trading strategies to guide RL agents. We evaluate (i) the economic rationale of LLM-generated strategies through expert review, and (ii) the performance of LLM-guided agents against unguided RL baselines using Sharpe Ratio (SR) and Maximum Drawdown (MDD). Empirical results indicate that LLM guidance improves both return and risk metrics relative to standard RL.
