Table of Contents
Fetching ...

RankEvolve: Automating the Discovery of Retrieval Algorithms via LLM-Driven Evolution

Jinming Nian, Fangchen Li, Dae Hoon Park, Yi Fang

TL;DR

The results suggest that evaluator-guided LLM program evolution is a practical path towards automatic discovery of novel ranking algorithms, and suggest that evaluator-guided LLM program evolution is a practical path towards automatic discovery of novel ranking algorithms.

Abstract

Retrieval algorithms like BM25 and query likelihood with Dirichlet smoothing remain strong and efficient first-stage rankers, yet improvements have mostly relied on parameter tuning and human intuition. We investigate whether a large language model, guided by an evaluator and evolutionary search, can automatically discover improved lexical retrieval algorithms. We introduce RankEvolve, a program evolution setup based on AlphaEvolve, in which candidate ranking algorithms are represented as executable code and iteratively mutated, recombined, and selected based on retrieval performance across 12 IR datasets from BEIR and BRIGHT. RankEvolve starts from two seed programs: BM25 and query likelihood with Dirichlet smoothing. The evolved algorithms are novel, effective, and show promising transfer to the full BEIR and BRIGHT benchmarks as well as TREC DL 19 and 20. Our results suggest that evaluator-guided LLM program evolution is a practical path towards automatic discovery of novel ranking algorithms.

RankEvolve: Automating the Discovery of Retrieval Algorithms via LLM-Driven Evolution

TL;DR

The results suggest that evaluator-guided LLM program evolution is a practical path towards automatic discovery of novel ranking algorithms, and suggest that evaluator-guided LLM program evolution is a practical path towards automatic discovery of novel ranking algorithms.

Abstract

Retrieval algorithms like BM25 and query likelihood with Dirichlet smoothing remain strong and efficient first-stage rankers, yet improvements have mostly relied on parameter tuning and human intuition. We investigate whether a large language model, guided by an evaluator and evolutionary search, can automatically discover improved lexical retrieval algorithms. We introduce RankEvolve, a program evolution setup based on AlphaEvolve, in which candidate ranking algorithms are represented as executable code and iteratively mutated, recombined, and selected based on retrieval performance across 12 IR datasets from BEIR and BRIGHT. RankEvolve starts from two seed programs: BM25 and query likelihood with Dirichlet smoothing. The evolved algorithms are novel, effective, and show promising transfer to the full BEIR and BRIGHT benchmarks as well as TREC DL 19 and 20. Our results suggest that evaluator-guided LLM program evolution is a practical path towards automatic discovery of novel ranking algorithms.
Paper Structure (36 sections, 28 equations, 2 figures, 3 tables)

This paper contains 36 sections, 28 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Combined score over evolution steps for two seed programs. The combined score is the optimization target, defined as $0.8 \times \text{Avg Recall@100} + 0.2 \times \text{Avg nDCG@10}$, averaged across 12 IR datasets.
  • Figure 2: Evolution trajectories for the BM25 seed (left) and Dirichlet seed (right). Recall@100 improves nearly monotonically in both runs, while nDCG@10 occasionally regresses at the same steps, reflecting deliberate trades made by the evolutionary process to maximize the optimization target.