Table of Contents
Fetching ...

PromptTailor: Multi-turn Intent-Aligned Prompt Synthesis for Lightweight LLMs

Yizhou Xu, Janet Davis

TL;DR

PromptTailor introduces a compact, LoRA-tuned, 4-bit Llama3-8B prompt generator trained on a 12,300-dialogue synthetic dataset to transform user intents into rich, domain-aware prompts while preserving user preferences. The system employs a capabilities mechanism and a three-turn agent with RAG for intent-aligned optimization, enabling efficient edge deployment. Automated and human evaluations show substantial gains over chain-of-thought prompting and parity with state-of-the-art prompt optimization methods, with greater benefits for weaker models and fewer API calls. The work demonstrates that a lightweight student, guided by stronger teachers, can meaningfully improve open-ended responses across diverse LLMs and settings, highlighting practical benefits for on-device, privacy-preserving NLP.

Abstract

Lightweight language models remain attractive for on-device and privacy-sensitive applications, but their responses are highly sensitive to prompt quality. For open-ended generation, non-expert users often lack the knowledge or time to consistently craft high-quality prompts, leading them to rely on prompt optimization tools. However, a key challenge is ensuring the optimized prompts genuinely align with users' original intents and preferences. We introduce PromptTailor, a system for controllable prompt generation for open-ended text that improves model output quality by intent-aligned prompt synthesis. PromptTailor expands minimal user instructions into rich, domain-aware prompts while preserving the user's stated preferences. The system is a quantized Llama3-8B model fine-tuned with a lightweight LoRA adapter on 12,300 prompt-refinement dialogues spanning 41 everyday domains, distilled from three stronger LLMs. The adapter attaches to any Llama3-8B base, enabling edge deployment. In human and LLM-judge evaluations across multiple target models and optimization baselines, PromptTailor yields higher preference rates than chain-of-thought prompting and matches or surpasses state-of-the-art prompt optimization methods while requiring fewer model calls (e.g., 3 vs. 9). These results show that a compact student, guided by powerful teachers, can learn effective prompt-generation strategies that enhance response quality while maintaining alignment with user intent.

PromptTailor: Multi-turn Intent-Aligned Prompt Synthesis for Lightweight LLMs

TL;DR

PromptTailor introduces a compact, LoRA-tuned, 4-bit Llama3-8B prompt generator trained on a 12,300-dialogue synthetic dataset to transform user intents into rich, domain-aware prompts while preserving user preferences. The system employs a capabilities mechanism and a three-turn agent with RAG for intent-aligned optimization, enabling efficient edge deployment. Automated and human evaluations show substantial gains over chain-of-thought prompting and parity with state-of-the-art prompt optimization methods, with greater benefits for weaker models and fewer API calls. The work demonstrates that a lightweight student, guided by stronger teachers, can meaningfully improve open-ended responses across diverse LLMs and settings, highlighting practical benefits for on-device, privacy-preserving NLP.

Abstract

Lightweight language models remain attractive for on-device and privacy-sensitive applications, but their responses are highly sensitive to prompt quality. For open-ended generation, non-expert users often lack the knowledge or time to consistently craft high-quality prompts, leading them to rely on prompt optimization tools. However, a key challenge is ensuring the optimized prompts genuinely align with users' original intents and preferences. We introduce PromptTailor, a system for controllable prompt generation for open-ended text that improves model output quality by intent-aligned prompt synthesis. PromptTailor expands minimal user instructions into rich, domain-aware prompts while preserving the user's stated preferences. The system is a quantized Llama3-8B model fine-tuned with a lightweight LoRA adapter on 12,300 prompt-refinement dialogues spanning 41 everyday domains, distilled from three stronger LLMs. The adapter attaches to any Llama3-8B base, enabling edge deployment. In human and LLM-judge evaluations across multiple target models and optimization baselines, PromptTailor yields higher preference rates than chain-of-thought prompting and matches or surpasses state-of-the-art prompt optimization methods while requiring fewer model calls (e.g., 3 vs. 9). These results show that a compact student, guided by powerful teachers, can learn effective prompt-generation strategies that enhance response quality while maintaining alignment with user intent.

Paper Structure

This paper contains 22 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: An example of integrating PromptTailor into an existing Llama3-8B model. While multiple integration strategies are possible, our proposed structure allows the user to choose whether to activate the prompt generation capability. The intermediate output is then fed directly back into the model to produce results. The PromptTailor operates as a separate LoRA adapter agent, ensuring that it does not impact other LoRA adapters or components within the model.
  • Figure 2: Agent System Structure: The agent processes initial user input through three consecutive model calls. The agent's dialogue flow mirrors that of our synthetic data. The system utilizes RAG to retrieve historical user preference information from a local preference database. Additionally, capabilities information is collected from various sources by a dedicated collector component to help the LLM adjust its responses accordingly in subsequent turns.
  • Figure 3: The bar chart version of the table above. Each cell shows the number of judgments favoring the response from the raw user intent and preference (1), the optimized prompt (2), or indicating no difference (0).
  • Figure 4: The bar chart version of the table above. Each cell shows the number of judgments favoring the response from the raw user intent and preference (1), the optimized prompt (2), or indicating no difference (0).
  • Figure 5: The bar chart version of the table above. Each cell shows the number of judgments favoring the response from "Without Cap" (1), "Cap" (2), or indicating no difference (0).
  • ...and 1 more figures