Generative Prompt Internalization
Haebin Shin, Lei Ji, Yeyun Gong, Sungdong Kim, Eunbi Choi, Minjoon Seo
TL;DR
GenPI addresses the inefficiency of fixed, lengthy prompts in large language model applications by learning to internalize prompts through a joint training regime that also generates the prompt content and the rationale for behavior changes. It combines a standard distillation objective with a Prompt Generation loss and uses lightweight prompt-specific adapters trained on a small synthetic dataset produced via Self Role-Playing, enabling effective prompt internalization without additional prompt inputs at inference. Across three agent-oriented benchmarks, GenPI achieves near- upper-bound performance on short prompts and maintains strong performance (>82%) on longer, over-1000-token prompts, while delivering substantial efficiency gains (up to 39% MACs/FLOPs and 17% latency) compared with prompt-based baselines. The approach offers a practical pathway to reducing inference costs in real-world LLM deployments by internalizing per-prompt behavior, with potential extensions to retrieval-augmented and multimodal settings.
Abstract
Prompts used in recent large language model based applications are often fixed and lengthy, leading to significant computational overhead. To address this challenge, we propose Generative Prompt Internalization (GenPI), a lightweight method that employs a joint training approach. GenPI not only replicates the behavior of models with prompt inputs but also generates the content of the prompt along with reasons for why the model's behavior should change accordingly. We demonstrate that our approach effectively internalizes complex prompts across various agent-based application scenarios. For effective training without interactions with the dedicated environments, we introduce a data synthesis technique that autonomously collects conversational datasets by swapping the roles of the agent and environment. This method is especially useful in scenarios where only a predefined prompt is available without a corresponding training dataset. By internalizing complex prompts, Generative Prompt Internalization enables high performance and efficient inference without the need for explicit prompts.
