QuantAgent: Seeking Holy Grail in Trading by Self-Improving Large Language Model

Saizhuo Wang; Hang Yuan; Lionel M. Ni; Jian Guo

QuantAgent: Seeking Holy Grail in Trading by Self-Improving Large Language Model

Saizhuo Wang, Hang Yuan, Lionel M. Ni, Jian Guo

TL;DR

This work tackles the challenge of equipping autonomous LLM agents with domain knowledge for quantitative finance by introducing a principled two-layer framework with inner and outer loops that autonomously build and refine a domain knowledge base. The inner loop enables iterative writer–judge refinement within a simulated knowledge environment, while the outer loop tests outputs in the real world to enrich the KB, with theoretical efficiency guarantees framed as an MDP and offline pessimism analyses. QuantAgent, the instantiated agent for financial signal mining, demonstrates self-improvement by generating and improving a diverse set of predictive signals (alphas) as backtesting-like evaluations progress, improving predictive power and alpha relevance. The results suggest a scalable, self-sustaining approach to knowledge-enhanced trading agents, with potential applicability to other decision-making domains beyond finance.

Abstract

Autonomous agents based on Large Language Models (LLMs) that devise plans and tackle real-world challenges have gained prominence.However, tailoring these agents for specialized domains like quantitative investment remains a formidable task. The core challenge involves efficiently building and integrating a domain-specific knowledge base for the agent's learning process. This paper introduces a principled framework to address this challenge, comprising a two-layer loop.In the inner loop, the agent refines its responses by drawing from its knowledge base, while in the outer loop, these responses are tested in real-world scenarios to automatically enhance the knowledge base with new insights.We demonstrate that our approach enables the agent to progressively approximate optimal behavior with provable efficiency.Furthermore, we instantiate this framework through an autonomous agent for mining trading signals named QuantAgent. Empirical results showcase QuantAgent's capability in uncovering viable financial signals and enhancing the accuracy of financial forecasts.

QuantAgent: Seeking Holy Grail in Trading by Self-Improving Large Language Model

TL;DR

Abstract

Paper Structure (49 sections, 3 theorems, 5 equations, 5 figures, 2 algorithms)

This paper contains 49 sections, 3 theorems, 5 equations, 5 figures, 2 algorithms.

Introduction
Related Works
LLM-based Autonomous Agents
Adapting LLM Agents to Domain-specific Tasks
Self-improving LLM agents
Framework
The Inner Reasoning Loop
Components
Knowledge Base
Context Buffer
Writer
Judge
Procedure
One Iteration
Iterative Process
...and 34 more sections

Key Result

Lemma 4.3

The Bayesian regret of the planning agent in the inner loop is sublinear in the number of inner loop iterations $T$

Figures (5)

Figure 1: Our proposed framework. Left: The outer feedback loop. The agent generates an answer to the problem, submit to the real environment for evaluation, and receives feedback. The feedback is updated to the knowledge base (KB) of the agent that serves further usage. Right: The pipeline of the inner reasoning loop. During each loop iteration, the following things happen sequentially: ① The writer retrieves relevant knowledge from the KB to inform the initial answer; ② The writer generates an answer utilizing the retrieved knowledge; ③ The judge retrieves relevant knowledge from KB to hint its review; ④ The answer undergoes review, receiving a score and advice for improvement, which is then used for the next iteration. Both the writer and the judge are LLMs with specifically designed prompts and each step in the iteration uses buffer contents as context. This iterative process stops when the maximum number of steps ($T$) is reached or the score is high enough.
Figure 2: The analysis framework. Left: An abstract view of the inner and outer loop, where three different environments (real environment $\theta$, simulated environment $\bar{\theta}$ and the LLM-inferred environment $\hat{\theta}$) are characterized. Right: The chain of proofs that links the agent's actual policy $\hat{\pi}$ and the optimal policy in the real environment $\pi^*$
Figure 3: Evolving curve of averaged grouped single-alpha performance. From left to right: entry valid ratio, IC score, return, Sharpe ratio
Figure 4: Trading idea relevance comparison. From left to right: with inner and outer loop; without inner loop, only outer loop;without outer loop, only inner loop; no inner or outer loop.
Figure 5: Increasing predictive accuracy in the trained model as the number of alpha accumulates. Y-axis indicates the MSE error

Theorems & Definitions (7)

Definition 4.2
Lemma 4.3
proof
Lemma 4.5
proof
Theorem 4.6
proof

QuantAgent: Seeking Holy Grail in Trading by Self-Improving Large Language Model

TL;DR

Abstract

QuantAgent: Seeking Holy Grail in Trading by Self-Improving Large Language Model

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (7)