Table of Contents
Fetching ...

Transfer-Prompting: Enhancing Cross-Task Adaptation in Large Language Models via Dual-Stage Prompts Optimization

Yupeng Chang, Yi Chang, Yuan Wu

TL;DR

Transfer-Prompting introduces a two-stage prompt optimization framework that separately constructs generalized source prompts and then specializes them to target tasks via fine-tuning. A reference LLM proposes prompts while a scorer LLM, guided by a multi-dimensional objective prompt evaluator, provides feedback across $\mathcal{M}$ metrics to drive iterative improvement. Across 25 LLMs and diverse datasets, the method yields significant gains in instruction following, accuracy, and calibration, demonstrating strong cross-task adaptability in medical, legal, and financial domains. The approach outperforms established baselines and offers practical benefits for robust, domain-aware prompt optimization in real-world AI systems.

Abstract

Large language models (LLMs) face significant challenges when balancing multiple high-level objectives, such as generating coherent, relevant, and high-quality responses while maintaining efficient task adaptation across diverse tasks. To address these challenges, we introduce Transfer-Prompting, a novel two-stage framework designed to enhance cross-task adaptation in prompt generation. The framework comprises two key components: (1) source prompt construction, which refines the original prompts on source task datasets to generate source prompts with enhanced generalization ability, and (2) target prompt generation, which enhances cross-task adaptation of target prompts by fine-tuning a set of high-scored source prompts on task-specific datasets. In each optimization cycle, a reference LLM generates candidate prompts based on historical prompt-score pairs and task descriptions in our designed reference prompt. These candidate prompts are refined iteratively, while a scorer LLM evaluates their effectiveness using the multi-dimensional metrics designed in the objective prompts evaluator-a novel contribution in this work that provides a holistic evaluation of prompt quality and task performance. This feedback loop facilitates continuous refinement, optimizing both prompt quality and task-specific outcomes. We validate Transfer-Prompting through extensive experiments across 25 LLMs, including 7 foundational models and 18 specialized models, evaluated on 9 diverse datasets. The results demonstrate that Transfer-Prompting significantly improves task-specific performance, highlighting its potential for enhancing cross-task adaptation in LLMs. The code is available at https://github.com/llm172/Transfer-Prompting.

Transfer-Prompting: Enhancing Cross-Task Adaptation in Large Language Models via Dual-Stage Prompts Optimization

TL;DR

Transfer-Prompting introduces a two-stage prompt optimization framework that separately constructs generalized source prompts and then specializes them to target tasks via fine-tuning. A reference LLM proposes prompts while a scorer LLM, guided by a multi-dimensional objective prompt evaluator, provides feedback across metrics to drive iterative improvement. Across 25 LLMs and diverse datasets, the method yields significant gains in instruction following, accuracy, and calibration, demonstrating strong cross-task adaptability in medical, legal, and financial domains. The approach outperforms established baselines and offers practical benefits for robust, domain-aware prompt optimization in real-world AI systems.

Abstract

Large language models (LLMs) face significant challenges when balancing multiple high-level objectives, such as generating coherent, relevant, and high-quality responses while maintaining efficient task adaptation across diverse tasks. To address these challenges, we introduce Transfer-Prompting, a novel two-stage framework designed to enhance cross-task adaptation in prompt generation. The framework comprises two key components: (1) source prompt construction, which refines the original prompts on source task datasets to generate source prompts with enhanced generalization ability, and (2) target prompt generation, which enhances cross-task adaptation of target prompts by fine-tuning a set of high-scored source prompts on task-specific datasets. In each optimization cycle, a reference LLM generates candidate prompts based on historical prompt-score pairs and task descriptions in our designed reference prompt. These candidate prompts are refined iteratively, while a scorer LLM evaluates their effectiveness using the multi-dimensional metrics designed in the objective prompts evaluator-a novel contribution in this work that provides a holistic evaluation of prompt quality and task performance. This feedback loop facilitates continuous refinement, optimizing both prompt quality and task-specific outcomes. We validate Transfer-Prompting through extensive experiments across 25 LLMs, including 7 foundational models and 18 specialized models, evaluated on 9 diverse datasets. The results demonstrate that Transfer-Prompting significantly improves task-specific performance, highlighting its potential for enhancing cross-task adaptation in LLMs. The code is available at https://github.com/llm172/Transfer-Prompting.

Paper Structure

This paper contains 27 sections, 6 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Illustration of the Two-Stage Prompt Automatic Optimization Framework in Transfer-Prompting: This framework mainly consists of two optimization stages: source prompt construction and target prompt generation. It involves four key tools: reference LLM, reference Prompt, scorer LLM, and the corresponding objective prompt evaluator.
  • Figure 2: An example of the reference prompt for reference LLM (PaLM 2-L and PaLM 2-L-IT) on the medically relevant datasets. The generated instruction is inserted at the position marked by <INS> in the input. The green text displays instructions for prompts and scores; the orange text provides examples of how to apply the instruction; the blue text contains the prompts and scores pairs.
  • Figure 3: Comparative performance evaluation of various medical, legal, and financial models. The confidence is calculated by the verbalized confidence method.
  • Figure 4: Score curves of the two-stage prompt optimization process of Transfer-Prompting on MMLU medical-related tasks.
  • Figure 5: The zero-shot performance of different medical domain LLMs on MMLU medical-related tasks is evaluated using logits.
  • ...and 4 more figures