Self-Prompt Tuning: Enable Autonomous Role-Playing in LLMs
Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, Xin Zhou, Jiaming Zhou, Haoqin Sun
TL;DR
This work addresses the challenge of manually crafting expert role prompts for LLMs by introducing self-prompt tuning, which enables autonomous role-playing through parameter updates rather than external prompts. The authors construct LIMA-Role by augmenting a small instruction-tuning corpus with GPT-4 generated role prompts and then fine-tune Mistral-7B and Llama-2-7B to generate prompts for new questions. Evaluations across eight NLP benchmarks and an open-ended test show that self-prompt tuned models generally outperform standard instruction-tuned baselines, though they lag behind official or ChatGPT baselines on some open-ended tasks due to data scale. They release the LIMA-Role dataset, the fine-tuned models, and code to promote automation of prompting strategies and inspire future work on broader prompting techniques.
Abstract
Recent advancements in LLMs have showcased their remarkable role-playing capabilities, able to accurately simulate the dialogue styles and cognitive processes of various roles based on different instructions and contexts. Studies indicate that assigning LLMs the roles of experts, a strategy known as role-play prompting, can enhance their performance in the corresponding domains. However, the prompt needs to be manually designed for the given problem, requiring certain expertise and iterative modifications. To this end, we propose self-prompt tuning, making LLMs themselves generate role-play prompts through fine-tuning. Leveraging the LIMA dataset as our foundational corpus, we employ GPT-4 to annotate role-play prompts for each data points, resulting in the creation of the LIMA-Role dataset. We then fine-tune LLMs like Llama-2-7B and Mistral-7B on LIMA-Role. Consequently, the self-prompt tuned LLMs can automatically generate expert role prompts for any given question. We extensively evaluate self-prompt tuned LLMs on widely used NLP benchmarks and open-ended question test. Our empirical results illustrate that self-prompt tuned LLMs outperform standard instruction tuned baselines across most datasets. This highlights the great potential of utilizing fine-tuning to enable LLMs to self-prompt, thereby automating complex prompting strategies. We release the dataset, models, and code at this \href{https://anonymous.4open.science/r/Self-Prompt-Tuning-739E/}{url}.
