Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning

Wenjin Liu; Haoran Luo; Xueyuan Lin; Haoming Liu; Tiesunlong Shen; Jiapu Wang; Rui Mao; Erik Cambria

Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning

Wenjin Liu, Haoran Luo, Xueyuan Lin, Haoming Liu, Tiesunlong Shen, Jiapu Wang, Rui Mao, Erik Cambria

TL;DR

Prompt-R1 presents a collaborative automatic prompting framework in which a small-scale LLM acts as an agent that learns to prompt a large-scale LLM through end-to-end reinforcement learning. By implementing a dual-constrained reward and a plug-and-play architecture, the approach enables multi-turn interactions that improve correctness and generation quality without fine-tuning large models. Empirical results across multi-hop reasoning, QA, math computation, and text generation demonstrate consistent improvements over strong baselines and good generalization to out-of-distribution tasks, including transferring to different LLM environments. The work highlights the potential of cross-LLM collaboration to reduce prompting costs while boosting robustness and adaptability in complex reasoning tasks.

Abstract

Recently, advanced large language models (LLMs) have emerged at an increasingly rapid pace. However, when faced with complex problems, most users are often unable to provide accurate and effective prompts to interact with LLMs, thus limiting the performance of LLMs. To address this challenge, we propose Prompt-R1, an end-to-end reinforcement learning framework that uses a small-scale LLM to collaborate with large-scale LLMs, replacing user interaction to solve problems better. This collaboration is cast as a multi-turn prompt interaction, where the small-scale LLM thinks and generates prompts, and the large-scale LLM performs complex reasoning. A dual-constrained reward is designed to optimize for correctness, generation quality, and reasoning accuracy. Prompt-R1 provides a plug-and-play framework that supports both inference and training with various large-scale LLMs. Experiments on multiple public datasets show that Prompt-R1 significantly outperforms baseline models across tasks. Our code is publicly available at https://github.com/QwenQKing/Prompt-R1.

Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning

TL;DR

Abstract

Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)

Theorems & Definitions (6)