Table of Contents
Fetching ...

Synergistic Formulaic Alpha Generation for Quantitative Trading based on Reinforcement Learning

Hong-Gi Shin, Sukhyun Jeong, Eui-Yeon Kim, Sungho Hong, Young-Jin Cho, Yong-Hoon Choi

TL;DR

This work addresses the challenge of generating synergistic formulaic alpha factors for quantitative trading by leveraging reinforcement learning in a broadened search space and seed-based initialization. It expands the operator set and uses pre-generated seed alphas to bias exploration, aiming to maximize $IC$ and $RankIC$ on CSI300 data. Across experiments and case studies, the approach consistently improves information-based performance metrics and backtested cumulative returns compared to prior methods, demonstrating the value of seed initialization and space expansion for synergistic alpha sets. Limitations include increased complexity from longer formulas and early instability in information coefficients, with future work pointing to multi-market validation and synchronized initialization with the experience buffer.

Abstract

Mining of formulaic alpha factors refers to the process of discovering and developing specific factors or indicators (referred to as alpha factors) for quantitative trading in stock market. To efficiently discover alpha factors in vast search space, reinforcement learning (RL) is commonly employed. This paper proposes a method to enhance existing alpha factor mining approaches by expanding a search space and utilizing pretrained formulaic alpha set as initial seed values to generate synergistic formulaic alpha. We employ information coefficient (IC) and rank information coefficient (Rank IC) as performance evaluation metrics for the model. Using CSI300 market data, we conducted real investment simulations and observed significant performance improvement compared to existing techniques.

Synergistic Formulaic Alpha Generation for Quantitative Trading based on Reinforcement Learning

TL;DR

This work addresses the challenge of generating synergistic formulaic alpha factors for quantitative trading by leveraging reinforcement learning in a broadened search space and seed-based initialization. It expands the operator set and uses pre-generated seed alphas to bias exploration, aiming to maximize and on CSI300 data. Across experiments and case studies, the approach consistently improves information-based performance metrics and backtested cumulative returns compared to prior methods, demonstrating the value of seed initialization and space expansion for synergistic alpha sets. Limitations include increased complexity from longer formulas and early instability in information coefficients, with future work pointing to multi-market validation and synchronized initialization with the experience buffer.

Abstract

Mining of formulaic alpha factors refers to the process of discovering and developing specific factors or indicators (referred to as alpha factors) for quantitative trading in stock market. To efficiently discover alpha factors in vast search space, reinforcement learning (RL) is commonly employed. This paper proposes a method to enhance existing alpha factor mining approaches by expanding a search space and utilizing pretrained formulaic alpha set as initial seed values to generate synergistic formulaic alpha. We employ information coefficient (IC) and rank information coefficient (Rank IC) as performance evaluation metrics for the model. Using CSI300 market data, we conducted real investment simulations and observed significant performance improvement compared to existing techniques.
Paper Structure (12 sections, 4 figures, 5 tables)

This paper contains 12 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Comparison of IC changes according to pool size variations in the Combination model on CSI300. Display average IC after training five random seeds for each pool size. Red: Original search space, Blue: Expanded search space.
  • Figure 2: IC change during the Test period when initializing the Alpha set with Alpha 101's formulaic alpha factor. Non-Init: Training without initializing with a separate formulaic alpha. Blue: Original search space, Red: Expanded search space. Green: IC for the test set of the existing Alpha 101 formula.
  • Figure 3: IC change during the Test period when initializing the Alpha set with the created alpha set. Non-Init: Training without inserting any separate formulaic alpha. Blue: Original search space, Red: Expanded search space. Green: IC for the test set of alphas created when the pool size is 10.
  • Figure 4: Cumulative return changes during the test period with the created mega alpha set. Set pool size to 20 and display individually after training five random seeds. Black: CSI300 index for the test set period. Blue: alpha set created with original search space, red: alpha set created with expanded search space, green: alpha set created after initializing with alpha set generated when pool size is 10.