A Survey of Automatic Prompt Engineering: An Optimization Perspective
Wenwu Li, Xiangfeng Wang, Wenhao Li, Bo Jin
TL;DR
<3-5 sentence high-level summary> This survey addresses how to optimize prompts for foundation models from an optimization-theoretic viewpoint, unifying discrete, continuous, and hybrid prompt spaces across text, vision, and multimodal tasks. It formalizes prompt design as a maximization problem over a validation set and categorizes methods into FM-based, evolutionary, gradient-based, and RL approaches, linking practical implementations with theoretical foundations. The work differentiates prompt components (instructions, thoughts, exemplars, annotations) and downstream objective families, clarifying how each combination shapes performance under potential constraints. By outlining key future directions—such as constraint, multi-task, online, and multi-objective prompt optimization—it provides a rigorous roadmap for researchers and practitioners aiming to deploy adaptive, scalable prompt systems in real-world, cross-modal settings.
Abstract
The rise of foundation models has shifted focus from resource-intensive fine-tuning to prompt engineering, a paradigm that steers model behavior through input design rather than weight updates. While manual prompt engineering faces limitations in scalability, adaptability, and cross-modal alignment, automated methods, spanning foundation model (FM) based optimization, evolutionary methods, gradient-based optimization, and reinforcement learning, offer promising solutions. Existing surveys, however, remain fragmented across modalities and methodologies. This paper presents the first comprehensive survey on automated prompt engineering through a unified optimization-theoretic lens. We formalize prompt optimization as a maximization problem over discrete, continuous, and hybrid prompt spaces, systematically organizing methods by their optimization variables (instructions, soft prompts, exemplars), task-specific objectives, and computational frameworks. By bridging theoretical formulation with practical implementations across text, vision, and multimodal domains, this survey establishes a foundational framework for both researchers and practitioners, while highlighting underexplored frontiers in constrained optimization and agent-oriented prompt design.
