RankEvolve: Automating the Discovery of Retrieval Algorithms via LLM-Driven Evolution

Jinming Nian; Fangchen Li; Dae Hoon Park; Yi Fang

RankEvolve: Automating the Discovery of Retrieval Algorithms via LLM-Driven Evolution

Jinming Nian, Fangchen Li, Dae Hoon Park, Yi Fang

TL;DR

The results suggest that evaluator-guided LLM program evolution is a practical path towards automatic discovery of novel ranking algorithms, and suggest that evaluator-guided LLM program evolution is a practical path towards automatic discovery of novel ranking algorithms.

Abstract

Retrieval algorithms like BM25 and query likelihood with Dirichlet smoothing remain strong and efficient first-stage rankers, yet improvements have mostly relied on parameter tuning and human intuition. We investigate whether a large language model, guided by an evaluator and evolutionary search, can automatically discover improved lexical retrieval algorithms. We introduce RankEvolve, a program evolution setup based on AlphaEvolve, in which candidate ranking algorithms are represented as executable code and iteratively mutated, recombined, and selected based on retrieval performance across 12 IR datasets from BEIR and BRIGHT. RankEvolve starts from two seed programs: BM25 and query likelihood with Dirichlet smoothing. The evolved algorithms are novel, effective, and show promising transfer to the full BEIR and BRIGHT benchmarks as well as TREC DL 19 and 20. Our results suggest that evaluator-guided LLM program evolution is a practical path towards automatic discovery of novel ranking algorithms.

RankEvolve: Automating the Discovery of Retrieval Algorithms via LLM-Driven Evolution

TL;DR

Abstract

Paper Structure (36 sections, 28 equations, 2 figures, 3 tables)

This paper contains 36 sections, 28 equations, 2 figures, 3 tables.

Introduction
Related Work
Method
Search Space
Population Management
Mutation Proposal
Evaluator
Experiments
Baselines
Setup
Results
The Evolved BM25 Algorithm
The Evolved Query Likelihood Algorithm
Convergent Principles Across Seeds
Ablation Study
...and 21 more sections

Figures (2)

Figure 1: Combined score over evolution steps for two seed programs. The combined score is the optimization target, defined as $0.8 \times \text{Avg Recall@100} + 0.2 \times \text{Avg nDCG@10}$, averaged across 12 IR datasets.
Figure 2: Evolution trajectories for the BM25 seed (left) and Dirichlet seed (right). Recall@100 improves nearly monotonically in both runs, while nDCG@10 occasionally regresses at the same steps, reflecting deliberate trades made by the evolutionary process to maximize the optimization target.

RankEvolve: Automating the Discovery of Retrieval Algorithms via LLM-Driven Evolution

TL;DR

Abstract

RankEvolve: Automating the Discovery of Retrieval Algorithms via LLM-Driven Evolution

Authors

TL;DR

Abstract

Table of Contents

Figures (2)