DLPO: Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective
Dengyun Peng, Yuhang Zhou, Qiguang Chen, Jinhao Liu, Jingjing Chen, Libo Qin
TL;DR
DLPO reframes prompt optimization as a gradient-based process guided by a forward and a backward engine, and introduces seven text-based strategies inspired by traditional deep learning to enhance robustness, efficiency, and generalization. The framework systematically reduces update instability with TextualLearningRate, TextualDropout, and TextualSimulatedAnnealing, accelerates convergence with TextualLearningRateDecay, TextualMomentum, and TextualContrastiveLearning, and controls prompt complexity with TextualRegularization. Across GSM8K, BigGSM, BBH, MATH, and MGSM, DLPO achieves state-of-the-art results, outperforming TextGrad, APO, and even human-crafted prompts on several benchmarks. The work provides practical guidance and a publicly available implementation for robust, efficient, and generalizable automated prompt optimization in real-world settings.
Abstract
Large Language Models (LLMs) have achieved remarkable success across diverse tasks, largely driven by well-designed prompts. However, crafting and selecting such prompts often requires considerable human effort, significantly limiting its scalability. To mitigate this, recent studies have explored automated prompt optimization as a promising solution. Despite these efforts, existing methods still face critical challenges in robustness, efficiency, and generalization. To systematically address these challenges, we first conduct an empirical analysis to identify the limitations of current reflection-based prompt optimization paradigm. Building on these insights, we propose 7 innovative approaches inspired by traditional deep learning paradigms for prompt optimization (DLPO), seamlessly integrating these concepts into text-based gradient optimization. Through these advancements, we progressively tackle the aforementioned challenges and validate our methods through extensive experimentation. We hope our study not only provides valuable guidance for future research but also offers a comprehensive understanding of the challenges and potential solutions in prompt optimization. Our code is available at https://github.com/sfasfaffa/DLPO.
