Table of Contents
Fetching ...

Alpha Mining and Enhancing via Warm Start Genetic Programming for Quantitative Investment

Weizhe Ren, Yichen Qin, Yang Li

TL;DR

This work tackles the challenge of discovering stock alpha factors with genetic programming by introducing a Warm Start GP framework that confines search to a predefined, effective alpha structure and starts from a proven alpha. It relies on two hypotheses—structure-effectiveness and factor-effectiveness—to justify searching within a fixed structure and demonstrates that this yields a finite, denser space of viable alphas, reduced correlation among candidates, and improved out-of-sample predictive power. Empirical validation on 2020–2024 Chinese stock market data shows higher IC-based metrics and stronger portfolio performance compared with Alpha101 baselines and traditional GP, including AR > $50 ext%$ and SR > $1.0$ for larger holding portfolios. The framework thus acts as both an alpha miner and enhancer, offering interpretability and efficiency benefits, while signaling directions for more advanced aggregation models and addressing GP computational costs.

Abstract

Traditional genetic programming (GP) often struggles in stock alpha factor discovery due to its vast search space, overwhelming computational burden, and sporadic effective alphas. We find that GP performs better when focusing on promising regions rather than random searching. This paper proposes a new GP framework with carefully chosen initialization and structural constraints to enhance search performance and improve the interpretability of the alpha factors. This approach is motivated by and mimics the alpha searching practice and aims to boost the efficiency of such a process. Analysis of 2020-2024 Chinese stock market data shows that our method yields superior out-of-sample prediction results and higher portfolio returns than the benchmark.

Alpha Mining and Enhancing via Warm Start Genetic Programming for Quantitative Investment

TL;DR

This work tackles the challenge of discovering stock alpha factors with genetic programming by introducing a Warm Start GP framework that confines search to a predefined, effective alpha structure and starts from a proven alpha. It relies on two hypotheses—structure-effectiveness and factor-effectiveness—to justify searching within a fixed structure and demonstrates that this yields a finite, denser space of viable alphas, reduced correlation among candidates, and improved out-of-sample predictive power. Empirical validation on 2020–2024 Chinese stock market data shows higher IC-based metrics and stronger portfolio performance compared with Alpha101 baselines and traditional GP, including AR > and SR > for larger holding portfolios. The framework thus acts as both an alpha miner and enhancer, offering interpretability and efficiency benefits, while signaling directions for more advanced aggregation models and addressing GP computational costs.

Abstract

Traditional genetic programming (GP) often struggles in stock alpha factor discovery due to its vast search space, overwhelming computational burden, and sporadic effective alphas. We find that GP performs better when focusing on promising regions rather than random searching. This paper proposes a new GP framework with carefully chosen initialization and structural constraints to enhance search performance and improve the interpretability of the alpha factors. This approach is motivated by and mimics the alpha searching practice and aims to boost the efficiency of such a process. Analysis of 2020-2024 Chinese stock market data shows that our method yields superior out-of-sample prediction results and higher portfolio returns than the benchmark.

Paper Structure

This paper contains 13 sections, 5 equations, 14 figures, 3 tables, 1 algorithm.

Figures (14)

  • Figure 1: An example of a GP tree. The green parts are root nodes and the pink parts are leaf nodes.
  • Figure 2: The results indicate that effective alphas are sparse within the search space of traditional GP.
  • Figure 3: Alphas2, 3, and 4 are alphas that share the same structure as the effective Alpha1.
  • Figure 4: The blue segment represents fully random results, previously shown in Figure\ref{['fig:Random Factor1']}, while the purple segment reflects the results obtained under the given structural constraints.
  • Figure 5: The restricted crossover: only allows alphas with the same structure to exchange subtrees at the same positions, ensuring that the alpha structure remains unchanged.
  • ...and 9 more figures