Proactive Conversational Agents with Inner Thoughts
Xingyu Bruce Liu, Shitao Fang, Weiyan Shi, Chien-Sheng Wu, Takeo Igarashi, Xiang Anthony Chen
TL;DR
The paper tackles the challenge of making AI agents proactively engage in multi-party conversations by moving beyond next-speaker prediction to an Internal Thoughts framework. It combines formative human studies with a dual-process thinking model to generate covert thoughts, evaluate their intrinsic motivation, and determine timely participation. Across simulational and real-user evaluations, Inner Thoughts outperforms next-speaker baselines in coherence, turn-taking, engagement, and perceived intelligence, while users preferred its socially adept behavior. The work contributes a novel cognitive-inspired architecture, empirical heuristics for intrinsic motivation, and two open-source implementations (a playground and Swimmy Slackbot) that enable scalable exploration of proactive, human-like AI in social settings.
Abstract
One of the long-standing aspirations in conversational AI is to allow them to autonomously take initiatives in conversations, i.e., being proactive. This is especially challenging for multi-party conversations. Prior NLP research focused mainly on predicting the next speaker from contexts like preceding conversations. In this paper, we demonstrate the limitations of such methods and rethink what it means for AI to be proactive in multi-party, human-AI conversations. We propose that just like humans, rather than merely reacting to turn-taking cues, a proactive AI formulates its own inner thoughts during a conversation, and seeks the right moment to contribute. Through a formative study with 24 participants and inspiration from linguistics and cognitive psychology, we introduce the Inner Thoughts framework. Our framework equips AI with a continuous, covert train of thoughts in parallel to the overt communication process, which enables it to proactively engage by modeling its intrinsic motivation to express these thoughts. We instantiated this framework into two real-time systems: an AI playground web app and a chatbot. Through a technical evaluation and user studies with human participants, our framework significantly surpasses existing baselines on aspects like anthropomorphism, coherence, intelligence, and turn-taking appropriateness.
