LLM Roleplay: Simulating Human-Chatbot Interaction

Hovhannes Tamoyan; Hendrik Schuff; Iryna Gurevych

LLM Roleplay: Simulating Human-Chatbot Interaction

Hovhannes Tamoyan, Hendrik Schuff, Iryna Gurevych

TL;DR

This paper proposes LLM Roleplay: a goal-oriented, persona-based method to automatically generate diverse multi-turn dialogues simulating human-chatbot interaction and finds that this method can simulate human-chatbot dialogues with a high indistinguishability rate.

Abstract

The development of chatbots requires collecting a large number of human-chatbot dialogues to reflect the breadth of users' sociodemographic backgrounds and conversational goals. However, the resource requirements to conduct the respective user studies can be prohibitively high and often only allow for a narrow analysis of specific dialogue goals and participant demographics. In this paper, we propose LLM Roleplay: a goal-oriented, persona-based method to automatically generate diverse multi-turn dialogues simulating human-chatbot interaction. LLM Roleplay can be applied to generate dialogues with any type of chatbot and uses large language models (LLMs) to play the role of textually described personas. To validate our method, we collect natural human-chatbot dialogues from different sociodemographic groups and conduct a user study to compare these with our generated dialogues. We evaluate the capabilities of state-of-the-art LLMs in maintaining a conversation during their embodiment of a specific persona and find that our method can simulate human-chatbot dialogues with a high indistinguishability rate.

LLM Roleplay: Simulating Human-Chatbot Interaction

TL;DR

Abstract

Paper Structure (38 sections, 6 equations, 18 figures, 10 tables, 2 algorithms)

This paper contains 38 sections, 6 equations, 18 figures, 10 tables, 2 algorithms.

Introduction
Large Language Model Roleplay
Persona ($\mathcal{P}$)
Goal ($\mathcal{G}$)
Subject ($\mathcal{S}$)
Utterance ($\mathcal{U}$)
Dialogue ($\mathcal{D}$)
Experiments
Persona-Specific Dialogue Collection
Impact of Persona-Specific Information
Failure Cases
Prompt not in double-quotes.
Incoherent output.
Incoherent output of responder.
Inquirer self-reply.
...and 23 more sections

Figures (18)

Figure 1: Schematic illustration of our method: A textual description of a persona and a goal (top) is used to instruct the inquirer ($\mathcal{S_I}$) model to embody the given persona (left) and engage in a dialogue with the responder ($\mathcal{S_R}$) chatbot (right). We show that dialogues simulated by the inquirer LLM and the responder chatbot can effectively simulate human-chatbot interaction.
Figure 2: The distribution of detectability (left) and undetectability rates (right) per model for Llama-2, Mixtral, Vicuna, and GPT4. Each bar is stacked with confidence levels of: "Somewhat confident", "Confident" and "Very Confident". We shot that Mixtral has a relatively high undetectability rate of 44%, followed by GPT4 at 35%, Llama-2 at 33.5%, and Vicuna at 22.5%. The total (un)detectability rates for each model are mentioned in gray.
Figure 3: Age distribution of participants for persona-specific dialogue collection study
Figure 4: Race distribution of participants for persona-specific dialogue collection study
Figure 5: Gender distribution of participants for persona-specific dialogue collection study
...and 13 more figures

LLM Roleplay: Simulating Human-Chatbot Interaction

TL;DR

Abstract

LLM Roleplay: Simulating Human-Chatbot Interaction

Authors

TL;DR

Abstract

Table of Contents

Figures (18)