Table of Contents
Fetching ...

GAAPO: Genetic Algorithmic Applied to Prompt Optimization

Xavier Sécheresse, Jacques-Yves Guilbert--Ly, Antoine Villedieu de Torcy

TL;DR

GAAPO addresses the challenge of automated prompt optimization for LLMs by combining a genetic algorithm with multiple prompt-generation strategies in a unified evolutionary framework. The approach is evaluated on ETHOS, MMLU-Pro, and GPQA, showing superior validation performance and competitive generalization compared with baselines like APO, OPRO, and Mutator, while also analyzing the effects of population size, selection methods, and model-specific prompt generators. Key contributions include a modular architecture that integrates forced and random evolution strategies, a bandit- and SH-enabled evaluation scheme to reduce computational cost, and cross-model analyses highlighting trade-offs between performance and generalization. The work provides practical insights into automatic prompt optimization and establishes GAAPO as a flexible, extensible platform for advancing LLM prompting across tasks and models.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, with their performance heavily dependent on the quality of input prompts. While prompt engineering has proven effective, it typically relies on manual adjustments, making it time-consuming and potentially suboptimal. This paper introduces GAAPO (Genetic Algorithm Applied to Prompt Optimization), a novel hybrid optimization framework that leverages genetic algorithm principles to evolve prompts through successive generations. Unlike traditional genetic approaches that rely solely on mutation and crossover operations, GAAPO integrates multiple specialized prompt generation strategies within its evolutionary framework. Through extensive experimentation on diverse datasets including ETHOS, MMLU-Pro, and GPQA, our analysis reveals several important point for the future development of automatic prompt optimization methods: importance of the tradeoff between the population size and the number of generations, effect of selection methods on stability results, capacity of different LLMs and especially reasoning models to be able to automatically generate prompts from similar queries... Furthermore, we provide insights into the relative effectiveness of different prompt generation strategies and their evolution across optimization phases. These findings contribute to both the theoretical understanding of prompt optimization and practical applications in improving LLM performance.

GAAPO: Genetic Algorithmic Applied to Prompt Optimization

TL;DR

GAAPO addresses the challenge of automated prompt optimization for LLMs by combining a genetic algorithm with multiple prompt-generation strategies in a unified evolutionary framework. The approach is evaluated on ETHOS, MMLU-Pro, and GPQA, showing superior validation performance and competitive generalization compared with baselines like APO, OPRO, and Mutator, while also analyzing the effects of population size, selection methods, and model-specific prompt generators. Key contributions include a modular architecture that integrates forced and random evolution strategies, a bandit- and SH-enabled evaluation scheme to reduce computational cost, and cross-model analyses highlighting trade-offs between performance and generalization. The work provides practical insights into automatic prompt optimization and establishes GAAPO as a flexible, extensible platform for advancing LLM prompting across tasks and models.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, with their performance heavily dependent on the quality of input prompts. While prompt engineering has proven effective, it typically relies on manual adjustments, making it time-consuming and potentially suboptimal. This paper introduces GAAPO (Genetic Algorithm Applied to Prompt Optimization), a novel hybrid optimization framework that leverages genetic algorithm principles to evolve prompts through successive generations. Unlike traditional genetic approaches that rely solely on mutation and crossover operations, GAAPO integrates multiple specialized prompt generation strategies within its evolutionary framework. Through extensive experimentation on diverse datasets including ETHOS, MMLU-Pro, and GPQA, our analysis reveals several important point for the future development of automatic prompt optimization methods: importance of the tradeoff between the population size and the number of generations, effect of selection methods on stability results, capacity of different LLMs and especially reasoning models to be able to automatically generate prompts from similar queries... Furthermore, we provide insights into the relative effectiveness of different prompt generation strategies and their evolution across optimization phases. These findings contribute to both the theoretical understanding of prompt optimization and practical applications in improving LLM performance.

Paper Structure

This paper contains 29 sections, 2 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Schema of the general automatic prompt optimization process
  • Figure 2: Description of the GAAPO optimization process.
  • Figure 3: Description of the APO optimisation process, which served as a basis for GAAPO.
  • Figure 4: Results obtained by using several prompt generation strategies. LLM-optimizer used: llama-3.1-8B
  • Figure 5: Comparison of optimization trajectories between GPT-4o-mini and LLaMA3-8B models on the ETHOS dataset for GAAPO. The plot shows the evolution of validation scores (solid lines) and test scores (dashed lines) across generations for both models.
  • ...and 4 more figures