TAPO: Task-Referenced Adaptation for Prompt Optimization

Wenxin Luo; Weirui Wang; Xiaopeng Li; Weibo Zhou; Pengyue Jia; Xiangyu Zhao

TAPO: Task-Referenced Adaptation for Prompt Optimization

Wenxin Luo, Weirui Wang, Xiaopeng Li, Weibo Zhou, Pengyue Jia, Xiangyu Zhao

TL;DR

The paper tackles the inefficiency and lack of task specificity in automated prompt optimization by introducing TAPO, a multitask-aware framework that dynamically selects task-relevant metrics and evaluates prompts with a multi-metric score. TAPO combines a task-driven metric selection module, a metric fusion evaluator with dynamic weights, and an evolution-based prompt optimizer that uses mutation and tournament selection to progressively improve prompts. The core mechanism is the multi-objective score $S(\mathcal{P}) = \sum_{i=1}^{n} w_i \cdot M_i(\mathcal{P})$, which integrates multiple criteria such as similarity, diversity, perplexity, and complexity. Empirical results on six public datasets across GPT-3.5-turbo, GPT-4o, and Llama3-8B-Instruct show TAPO yields strong, task-adaptive performance and robust generalization, with ablations confirming the importance of both multi-metric evaluation and evolution-based optimization; the authors also release open-source code for replication.

Abstract

Prompt engineering can significantly improve the performance of large language models (LLMs), with automated prompt optimization (APO) gaining significant attention due to the time-consuming and laborious nature of manual prompt design. However, much of the existing work in APO overlooks task-specific characteristics, resulting in prompts that lack domain specificity and are not well-suited for task-specific optimization. In this paper, we introduce TAPO, a multitask-aware prompt optimization framework composed of three key modules. First, a task-aware metric selection module is proposed to enhance task-specific prompt generation capabilities. Second, we present a multi-metrics evaluation module to jointly evaluate prompts from multiple perspectives. Third, an evolution-based optimization framework is introduced for automatic prompt refinement, which improves adaptability across various tasks. Extensive experiments on six datasets demonstrate the effectiveness of our approach, and our code is publicly available.

TAPO: Task-Referenced Adaptation for Prompt Optimization

TL;DR

, which integrates multiple criteria such as similarity, diversity, perplexity, and complexity. Empirical results on six public datasets across GPT-3.5-turbo, GPT-4o, and Llama3-8B-Instruct show TAPO yields strong, task-adaptive performance and robust generalization, with ablations confirming the importance of both multi-metric evaluation and evolution-based optimization; the authors also release open-source code for replication.

Abstract

Paper Structure (14 sections, 1 equation, 2 figures, 3 tables)

This paper contains 14 sections, 1 equation, 2 figures, 3 tables.

Introduction
Methodology
Framework Overview
Dynamic Metric Selection
Task-Aware Prompt Evaluation
Evolution-Based Prompt Optimization
Experiment
Experiment Settings
Overall Performance (RQ1)
Task-Specific Prompt Performance (RQ2)
Open-Source LLM Performance (RQ3)
Ablation Study (RQ4)
Related Work
Conclusion

Figures (2)

Figure 1: The framework of TAPO. For Dynamic Metric Selection, We provide a task dataset example for the LLM to select metrics and assign weights based on priority, creating task-specific evaluation metrics for Task-Aware Prompt Evaluation. We employ a tournament selection algorithm for Evolution-Based Prompt Optimization to select and mutate the better-performing prompts, adding task-adapted prompts to the candidates.
Figure 2: Performance Comparison with Llama3-8B-Instruct.

TAPO: Task-Referenced Adaptation for Prompt Optimization

TL;DR

Abstract

TAPO: Task-Referenced Adaptation for Prompt Optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (2)