MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization
Yuyan Chen, Zhihao Wen, Ge Fan, Zhengyu Chen, Wei Wu, Dayiheng Liu, Zhixu Li, Bang Liu, Yanghua Xiao
TL;DR
This paper identifies that prompts are not universally optimal across LLMs and proposes MAPO, a framework to tailor prompts to each model for improved NLP performance. MAPO combines a warm-up dataset, supervised fine-tuning, a learned reward model, and reinforcement learning (PPO with model feedback) to generate model-adaptive prompts $P_o$ from originals $P$. Through extensive experiments on BLOOM-7B, GPT-J-6B, and LLaMA-7B across QA, classification, and generation tasks, MAPO demonstrates robust improvements and notable domain transfer capabilities, while ablation studies highlight the value of RL and the importance of maintaining generalization. While effective, MAPO requires substantial warm-up data and computational resources, guiding future work toward more efficient training and broader applicability across languages and tasks.
Abstract
Prompt engineering, as an efficient and effective way to leverage Large Language Models (LLM), has drawn a lot of attention from the research community. The existing research primarily emphasizes the importance of adapting prompts to specific tasks, rather than specific LLMs. However, a good prompt is not solely defined by its wording, but also binds to the nature of the LLM in question. In this work, we first quantitatively demonstrate that different prompts should be adapted to different LLMs to enhance their capabilities across various downstream tasks in NLP. Then we novelly propose a model-adaptive prompt optimizer (MAPO) method that optimizes the original prompts for each specific LLM in downstream tasks. Extensive experiments indicate that the proposed method can effectively refine prompts for an LLM, leading to significant improvements over various downstream tasks.
