The Theater Stage as Laboratory: Review of Real-Time Comedy LLM Systems for Live Performance
Piotr Wojciech Mirowski, Boyd Branch, Kory Wallace Mathewson
TL;DR
The paper argues that evaluating AI humor requires live, audience-facing contexts where real-time feedback and performance constraints reveal strengths and limitations of generative systems. It surveys a range of live shows and experimental formats—from robot performers and adversarial AI improv to AI-assisted world-building and digital-space performances—to classify the key challenges: embodiment and human–machine competition, timing and interaction, and human interpretation of AI output. It identifies design patterns such as human-in-the-loop line curation, AR-assisted line delivery, and AI as a narrative or inspirational partner, and it outlines evaluation approaches grounded in audience engagement, laughter metrics, and creativity-support tool metrics. The work advocates reframing AI humor as a collaborative creativity tool within live performance, with significant implications for system design, ethics, and future research in AI-driven entertainment.
Abstract
In this position paper, we review the eclectic recent history of academic and artistic works involving computational systems for humor generation, and focus specifically on live performance. We make the case that AI comedy should be evaluated in live conditions, in front of audiences sharing either physical or online spaces, and under real-time constraints. We further suggest that improvised comedy is therefore the perfect substrate for deploying and assessing computational humor systems. Using examples of successful AI-infused shows, we demonstrate that live performance raises three sets of challenges for computational humor generation: 1) questions around robotic embodiment, anthropomorphism and competition between humans and machines, 2) questions around comedic timing and the nature of audience interaction, and 3) questions about the human interpretation of seemingly absurd AI-generated humor. We argue that these questions impact the choice of methodologies for evaluating computational humor, as any such method needs to work around the constraints of live audiences and performance spaces. These interrogations also highlight different types of collaborative relationship of human comedians towards AI tools.
