Designing and Evaluating Dialogue LLMs for Co-Creative Improvised Theatre

Boyd Branch; Piotr Mirowski; Kory Mathewson; Sophia Ppali; Alexandra Covaci

Designing and Evaluating Dialogue LLMs for Co-Creative Improvised Theatre

Boyd Branch, Piotr Mirowski, Kory Mathewson, Sophia Ppali, Alexandra Covaci

TL;DR

This study investigates co-creative improvised theatre with multi-party dialogue between humans and LLM-powered agents in live performances. It deploys a participatory human-in-the-loop framework, featuring an Operator and a Curator to manage context and select AI lines across 26 Edinburgh Fringe shows, using three LLMs (GPT-3.5/4, PaLM 2, Llama 2). Through audience and actor surveys, logs, and post-hoc analysis, the work reveals nuanced perceptions of AI as a creativity partner rather than a fully autonomous performer, while highlighting technical and ethical challenges such as latency, turn-taking, and copyright concerns. The findings inform design principles for future live AI performances, emphasizing enhanced MPC capabilities, improved data provisioning, and human-centered interfaces to sustain engaging co-creative experiences in the arts.

Abstract

Social robotics researchers are increasingly interested in multi-party trained conversational agents. With a growing demand for real-world evaluations, our study presents Large Language Models (LLMs) deployed in a month-long live show at the Edinburgh Festival Fringe. This case study investigates human improvisers co-creating with conversational agents in a professional theatre setting. We explore the technical capabilities and constraints of on-the-spot multi-party dialogue, providing comprehensive insights from both audience and performer experiences with AI on stage. Our human-in-the-loop methodology underlines the challenges of these LLMs in generating context-relevant responses, stressing the user interface's crucial role. Audience feedback indicates an evolving interest for AI-driven live entertainment, direct human-AI interaction, and a diverse range of expectations about AI's conversational competence and utility as a creativity support tool. Human performers express immense enthusiasm, varied satisfaction, and the evolving public opinion highlights mixed emotions about AI's role in arts.

Designing and Evaluating Dialogue LLMs for Co-Creative Improvised Theatre

TL;DR

Abstract

Paper Structure (38 sections, 7 figures)

This paper contains 38 sections, 7 figures.

Introduction
Staging AI in Theatre Festival Performances
Participatory Human-in-the-Loop Design
Iterative Design with the Theatre Cast
Designing for the Theatre Audience
Speed Dating
Wedding Speech
Couples' Therapy and Meet the Parents
Hero's Journey
Improvised TED Talk and Movie Pitch
Designing AI Curation Interfaces
Audience and Actor Surveys After Performances
Survey Design
Data
Post-Performance Analysis
...and 23 more sections

Figures (7)

Figure 1: Cast of Improbotics performing AI-based improv theatre. The "Cyborg" is wearing an earphone connected to a radio system that receives text-to-speech from LLM-generated lines (and curated by an operator). The LLM is prompted by speech recognition. Photo: Lidia Crisafulli.
Figure 2: Screen capture of the AI Operator interface. At the top (in red) is the input box for human character's name and for lines of dialogue (this input box serves as a backup in case speech recognition does not work properly). Below (in blue) is the input box for the AI character's name and for scene context metadata, typed by the operatore. Below are several buttons to rapidly input scene-specific prompts such as "getting therapy" or "behaving in a sarcastic way". The interface then shows multiple lines: AI-generated lines are in black, speech recognition lines are in pink, and the curator-selected lines are in cyan.
Figure 3: Screen capture of the AI Curator interface, on a tablet. The latest speech recognition result is visible on top. Immediately below are buttons to scroll down to the latest AI-generated line, or to quickly input metadata ("more punny" or "more snarky") for the language agent. Generated lines are in white and curator-selected lines in violet.
Figure 4: Example of instruction prompt and results generated for three different LLMs.
Figure 5: Dialogue from an improvised scene between Paul, Julie (his mother) and the Cyborg (Paul's date) where the Cyborg meet's Paul's parents, with suggestion: ketchup soup. In this dialogue extract (recorded using speech recognition), the Cyborg says 3 lines, marked with solid blue, violet and pink arrows: "It's warming, comforting, and perfect for a cozy night in", "Unique flavors and dishes always make for a memorable meal" and "I can already tell where Paul gets his hospitality from". All LLM-generated lines are shown on the right side of the figure as a sequence of lines we call the AI Stream. Red dashed arrows show which dialogue sentence triggered which LLM-generated line.
...and 2 more figures

Designing and Evaluating Dialogue LLMs for Co-Creative Improvised Theatre

TL;DR

Abstract

Designing and Evaluating Dialogue LLMs for Co-Creative Improvised Theatre

Authors

TL;DR

Abstract

Table of Contents

Figures (7)