Table of Contents
Fetching ...

FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading

Guojun Xiong, Zhiyang Deng, Keyi Wang, Yupeng Cao, Haohang Li, Yangyang Yu, Xueqing Peng, Mingquan Lin, Kaleb E Smith, Xiao-Yang Liu, Jimin Huang, Sophia Ananiadou, Qianqian Xie

TL;DR

FLAG-Trader addresses the challenge of sequential, multimodal decision-making in financial markets by unifying an LLM-based policy with gradient-based reinforcement learning. It employs a partially fine-tuned LLM as the actor and a shared backbone for the critic, trained online with PPO and a text-based state representation guided by a carefully designed prompt. The approach highlights a parameter-efficient fine-tuning strategy and a shared architecture that enables small open-source LLMs to match or exceed larger models in trading tasks, supported by extensive empirical evaluation across multiple assets. This work suggests that integrating LLMs with reward-driven optimization can yield robust, scalable financial decision systems with practical impact for real-time trading and related tasks.

Abstract

Large language models (LLMs) fine-tuned on multimodal financial data have demonstrated impressive reasoning capabilities in various financial tasks. However, they often struggle with multi-step, goal-oriented scenarios in interactive financial markets, such as trading, where complex agentic approaches are required to improve decision-making. To address this, we propose \textsc{FLAG-Trader}, a unified architecture integrating linguistic processing (via LLMs) with gradient-driven reinforcement learning (RL) policy optimization, in which a partially fine-tuned LLM acts as the policy network, leveraging pre-trained knowledge while adapting to the financial domain through parameter-efficient fine-tuning. Through policy gradient optimization driven by trading rewards, our framework not only enhances LLM performance in trading but also improves results on other financial-domain tasks. We present extensive empirical evidence to validate these enhancements.

FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading

TL;DR

FLAG-Trader addresses the challenge of sequential, multimodal decision-making in financial markets by unifying an LLM-based policy with gradient-based reinforcement learning. It employs a partially fine-tuned LLM as the actor and a shared backbone for the critic, trained online with PPO and a text-based state representation guided by a carefully designed prompt. The approach highlights a parameter-efficient fine-tuning strategy and a shared architecture that enables small open-source LLMs to match or exceed larger models in trading tasks, supported by extensive empirical evaluation across multiple assets. This work suggests that integrating LLMs with reward-driven optimization can yield robust, scalable financial decision systems with practical impact for real-time trading and related tasks.

Abstract

Large language models (LLMs) fine-tuned on multimodal financial data have demonstrated impressive reasoning capabilities in various financial tasks. However, they often struggle with multi-step, goal-oriented scenarios in interactive financial markets, such as trading, where complex agentic approaches are required to improve decision-making. To address this, we propose \textsc{FLAG-Trader}, a unified architecture integrating linguistic processing (via LLMs) with gradient-driven reinforcement learning (RL) policy optimization, in which a partially fine-tuned LLM acts as the policy network, leveraging pre-trained knowledge while adapting to the financial domain through parameter-efficient fine-tuning. Through policy gradient optimization driven by trading rewards, our framework not only enhances LLM performance in trading but also improves results on other financial-domain tasks. We present extensive empirical evidence to validate these enhancements.

Paper Structure

This paper contains 16 sections, 24 equations, 3 figures, 3 tables, 2 algorithms.

Figures (3)

  • Figure 1: A high-level overview of our LLM-based reinforcement learning setup for financial trading. The environment provides the current state $s_t$. A prompt containing task details, the action space, and the current state is fed into the LLM, which outputs a trading action $a_t$. The action is executed in the environment, yielding a reward $r(s_t, a_t)$ and next state $s_{t+1}$. The log-likelihood $\log_{\pi_\theta}(a_t|\texttt{lang}(s_t))$ is then leveraged by a policy gradient method (e.g., PPO), with experience tuples stored in a replay buffer for iterative updates.
  • Figure 2: The FLAG-Trader pipeline for financial trading, utilizing an LLM-based actor-critic architecture. The LLM consists of frozen base layers$\theta_{\texttt{frozen}}$ that retain pre-trained knowledge and trainable top layers$\theta_{\texttt{train}}$ for financial decision-making. Both the Policy_Net and Value_Net share these trainable layers while maintaining separate policy head$\theta_P$ and value head$\theta_V$, which are updated by policy gradient method.
  • Figure 3: The format of input prompt. It contains the task description, the legible action set, the current state description, and the output action format.

Theorems & Definitions (1)

  • Remark 4.1