Navigating the Synchrony-Stability Frontier in Adaptive Chatbots
T. James Brandt
TL;DR
The paper establishes a formal synchrony–stability frontier for adaptive chatbots, showing that constraining turn-by-turn stylistic changes via bounded policies (Cap, EMA, Dead-Band, and Hybrids) yields substantial gains in persona stability and prompt legibility with only modest losses in immediate synchrony. It introduces an 8-dimensional style vector and a base+delta prompting framework within a closed-loop control system, and provides extensive validation across a 162-participant human-log dataset and three public corpora, plus LLM-in-the-loop experiments with GPT-4.1 nano and Claude Sonnet 4. The work demonstrates a robust, generalizable trade-off: Pareto-efficient policies on the frontier improve coherence and maintainability (prompt legibility, reduced register flips) while preserving user experience, offering concrete guidelines for deploying adaptive conversational agents. The accompanying reproducible artifact enables researchers to reproduce results and explore policy choices in diverse domains, informing principled design of trustworthy, adaptive AI systems.
Abstract
Adaptive chatbots that mimic a user's linguistic style can build rapport and engagement, yet unconstrained mimicry risks an agent that feels unstable or sycophantic. We present a computational evaluation framework that makes the core design tension explicit: balancing moment-to-moment linguistic synchrony against long-term persona stability. Using an 8-dimensional style vector and a closed-loop "base+delta" prompting architecture, we simulate and compare explicit adaptation policies - Uncapped, Cap, Exponential Moving Average (EMA), Dead-Band, and Hybrids - on a human-log dataset. Our analysis maps a clear Pareto frontier: bounded policies achieve substantial gains in stability at a modest cost to synchrony. For example, a Hybrid (EMA+Cap) raises stability from 0.542 to 0.878 (+62%) while reducing synchrony by only 17%. We confirm this trade-off through large-scale replications on three public corpora (DailyDialog, Persona-Chat, EmpatheticDialogues) and LLM-in-the-loop validation across two model families. Furthermore, we quantify "prompt legibility," showing that frontier policies reduce instruction churn and cut jarring register flips (major tone changes) from 0.254 to 0.092, yielding systems that are easier to reason about and maintain. Taken together, our framework provides a general evaluation harness for style adaptation; a systematic ablation that identifies Pareto-efficient policies; robust validation across diverse datasets and models; and novel legibility metrics linking policy choices to system maintainability.
