Diverse Prompts: Illuminating the Prompt Space of Large Language Models with MAP-Elites
Gabriel Machado Santos, Rita Maria da Silva Julia, Marcelo Zanchetta do Nascimento
TL;DR
This work addresses how prompt structure affects LLM task performance and introduces a CFG-based representation combined with MAP-Elites to systematically map the prompt space for quality and diversity. The approach yields diverse, high-performing prompts across seven BigBench Lite tasks and four sub-10B LLMs, revealing task-dependent structure-performance relationships. The findings offer practical guidance for adaptive prompt design and demonstrate the potential of quality-diversity search to enhance in-context learning. Overall, the framework provides a scalable, principled method for exploring and exploiting prompt architectures in real-world NLP settings.
Abstract
Prompt engineering is essential for optimizing large language models (LLMs), yet the link between prompt structures and task performance remains underexplored. This work introduces an evolutionary approach that combines context-free grammar (CFG) with the MAP-Elites algorithm to systematically explore the prompt space. Our method prioritizes quality and diversity, generating high-performing and structurally varied prompts while analyzing their alignment with diverse tasks by varying traits such as the number of examples (shots) and reasoning depth. By systematically mapping the phenotypic space, we reveal how structural variations influence LLM performance, offering actionable insights for task-specific and adaptable prompt design. Evaluated on seven BigBench Lite tasks across multiple LLMs, our results underscore the critical interplay of quality and diversity, advancing the effectiveness and versatility of LLMs.
