A Toolbox for Improving Evolutionary Prompt Search
Daniel Grießhaber, Maximilian Kimmich, Johannes Maucher, Ngoc Thang Vu
TL;DR
This paper tackles the high cost and fragility of evolutionary prompt optimization for LLMs by introducing a modular framework that decomposes the evolution process into initialization, evolution, evaluation, and selection. It leverages an LLM as both the operator and judge, augments the workflow with a human-in-the-loop, and adopts chain-of-instructions prompting to improve control and feedback granularity (CoI prompting). Efficient evaluation strategies, including moment-based and parent-based early stopping plus strategic data ordering, reduce computational overhead while preserving performance. Empirical results across diverse NLP tasks show that CoI prompting, together with an LLM judge and human feedback, yields consistent improvements and better resource efficiency, with the approach proving robust across multiple LLM variants; the authors also release the code to facilitate further research and application.
Abstract
Evolutionary prompt optimization has demonstrated effectiveness in refining prompts for LLMs. However, existing approaches lack robust operators and efficient evaluation mechanisms. In this work, we propose several key improvements to evolutionary prompt optimization that can partially generalize to prompt optimization in general: 1) decomposing evolution into distinct steps to enhance the evolution and its control, 2) introducing an LLM-based judge to verify the evolutions, 3) integrating human feedback to refine the evolutionary operator, and 4) developing more efficient evaluation strategies that maintain performance while reducing computational overhead. Our approach improves both optimization quality and efficiency. We release our code, enabling prompt optimization on new tasks and facilitating further research in this area.
