FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading

Guojun Xiong; Zhiyang Deng; Keyi Wang; Yupeng Cao; Haohang Li; Yangyang Yu; Xueqing Peng; Mingquan Lin; Kaleb E Smith; Xiao-Yang Liu; Jimin Huang; Sophia Ananiadou; Qianqian Xie

FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading

Guojun Xiong, Zhiyang Deng, Keyi Wang, Yupeng Cao, Haohang Li, Yangyang Yu, Xueqing Peng, Mingquan Lin, Kaleb E Smith, Xiao-Yang Liu, Jimin Huang, Sophia Ananiadou, Qianqian Xie

TL;DR

FLAG-Trader addresses the challenge of sequential, multimodal decision-making in financial markets by unifying an LLM-based policy with gradient-based reinforcement learning. It employs a partially fine-tuned LLM as the actor and a shared backbone for the critic, trained online with PPO and a text-based state representation guided by a carefully designed prompt. The approach highlights a parameter-efficient fine-tuning strategy and a shared architecture that enables small open-source LLMs to match or exceed larger models in trading tasks, supported by extensive empirical evaluation across multiple assets. This work suggests that integrating LLMs with reward-driven optimization can yield robust, scalable financial decision systems with practical impact for real-time trading and related tasks.

Abstract

Large language models (LLMs) fine-tuned on multimodal financial data have demonstrated impressive reasoning capabilities in various financial tasks. However, they often struggle with multi-step, goal-oriented scenarios in interactive financial markets, such as trading, where complex agentic approaches are required to improve decision-making. To address this, we propose \textsc{FLAG-Trader}, a unified architecture integrating linguistic processing (via LLMs) with gradient-driven reinforcement learning (RL) policy optimization, in which a partially fine-tuned LLM acts as the policy network, leveraging pre-trained knowledge while adapting to the financial domain through parameter-efficient fine-tuning. Through policy gradient optimization driven by trading rewards, our framework not only enhances LLM performance in trading but also improves results on other financial-domain tasks. We present extensive empirical evidence to validate these enhancements.

FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading

TL;DR

Abstract

FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)

Theorems & Definitions (1)