PromptTailor: Multi-turn Intent-Aligned Prompt Synthesis for Lightweight LLMs
Yizhou Xu, Janet Davis
TL;DR
PromptTailor introduces a compact, LoRA-tuned, 4-bit Llama3-8B prompt generator trained on a 12,300-dialogue synthetic dataset to transform user intents into rich, domain-aware prompts while preserving user preferences. The system employs a capabilities mechanism and a three-turn agent with RAG for intent-aligned optimization, enabling efficient edge deployment. Automated and human evaluations show substantial gains over chain-of-thought prompting and parity with state-of-the-art prompt optimization methods, with greater benefits for weaker models and fewer API calls. The work demonstrates that a lightweight student, guided by stronger teachers, can meaningfully improve open-ended responses across diverse LLMs and settings, highlighting practical benefits for on-device, privacy-preserving NLP.
Abstract
Lightweight language models remain attractive for on-device and privacy-sensitive applications, but their responses are highly sensitive to prompt quality. For open-ended generation, non-expert users often lack the knowledge or time to consistently craft high-quality prompts, leading them to rely on prompt optimization tools. However, a key challenge is ensuring the optimized prompts genuinely align with users' original intents and preferences. We introduce PromptTailor, a system for controllable prompt generation for open-ended text that improves model output quality by intent-aligned prompt synthesis. PromptTailor expands minimal user instructions into rich, domain-aware prompts while preserving the user's stated preferences. The system is a quantized Llama3-8B model fine-tuned with a lightweight LoRA adapter on 12,300 prompt-refinement dialogues spanning 41 everyday domains, distilled from three stronger LLMs. The adapter attaches to any Llama3-8B base, enabling edge deployment. In human and LLM-judge evaluations across multiple target models and optimization baselines, PromptTailor yields higher preference rates than chain-of-thought prompting and matches or surpasses state-of-the-art prompt optimization methods while requiring fewer model calls (e.g., 3 vs. 9). These results show that a compact student, guided by powerful teachers, can learn effective prompt-generation strategies that enhance response quality while maintaining alignment with user intent.
