Table of Contents
Fetching ...

Prompt and Parameter Co-Optimization for Large Language Models

Xiaohe Bo, Rui Li, Zexu Sun, Quanyu Dai, Zeyu Zhang, Zihang Tian, Xu Chen, Zhenhua Dong

TL;DR

This paper introduces MetaTuner, a novel framework that jointly integrates prompt optimization and fine-tuning for LLM training and introduces two neural networks to generate prompts and parameters, respectively, while allowing them to share a common bottom encoding layer to enable knowledge sharing.

Abstract

Prompt optimization and fine-tuning are two major approaches to improve the performance of Large Language Models (LLMs). They enhance the capabilities of LLMs from complementary perspectives: the former through explicit natural language, and the latter through implicit parameter updates. However, prior work has typically studied them in isolation, leaving their synergistic potential largely underexplored. To bridge this gap, in this paper, we introduce MetaTuner, a novel framework that jointly integrates prompt optimization and fine-tuning for LLM training. Specifically, we introduce two neural networks to generate prompts and parameters, respectively, while allowing them to share a common bottom encoding layer to enable knowledge sharing. By the guidance of the final supervised signals, our framework is optimized to discover the optimal combinations between the prompts and parameters. Given that prompt learning involves discrete optimization while fine-tuning operates in a continuous parameter space, we design a supervised regularization loss to train our framework effectively. Extensive experiments across diverse benchmarks show that our method consistently outperforms the baselines.

Prompt and Parameter Co-Optimization for Large Language Models

TL;DR

This paper introduces MetaTuner, a novel framework that jointly integrates prompt optimization and fine-tuning for LLM training and introduces two neural networks to generate prompts and parameters, respectively, while allowing them to share a common bottom encoding layer to enable knowledge sharing.

Abstract

Prompt optimization and fine-tuning are two major approaches to improve the performance of Large Language Models (LLMs). They enhance the capabilities of LLMs from complementary perspectives: the former through explicit natural language, and the latter through implicit parameter updates. However, prior work has typically studied them in isolation, leaving their synergistic potential largely underexplored. To bridge this gap, in this paper, we introduce MetaTuner, a novel framework that jointly integrates prompt optimization and fine-tuning for LLM training. Specifically, we introduce two neural networks to generate prompts and parameters, respectively, while allowing them to share a common bottom encoding layer to enable knowledge sharing. By the guidance of the final supervised signals, our framework is optimized to discover the optimal combinations between the prompts and parameters. Given that prompt learning involves discrete optimization while fine-tuning operates in a continuous parameter space, we design a supervised regularization loss to train our framework effectively. Extensive experiments across diverse benchmarks show that our method consistently outperforms the baselines.

Paper Structure

This paper contains 27 sections, 9 equations, 11 figures, 11 tables.

Figures (11)

  • Figure 1: Preliminary analysis of prompt optimization and fine-tuning methods. In the left subfigure, we present a comparison between four representative prompt optimization strategies and four popular fine-tuning methods. In the right subfigure, we utilize SFT as the backbone fine-tuning method and further evaluate the model with different training prompts, where the corresponding results are marked with stars. Additional details for the implementation can be found in the Appendix \ref{['app:explore_implementation']}.
  • Figure 2: Illustration of the MetaTuner framework. The input query is first encoded by the meta encoder, and then two parallel decoders are utilized to generate the prompts and parameters, respectively, which are finally applied to the downstream actor model for problem solving.
  • Figure 3: Performance with different proportions of shared decoder layers.
  • Figure 4: Comparison results between our method with Gumbel-Softmax.
  • Figure 5: Comparison results between different methods for generating $D_2$.
  • ...and 6 more figures