Table of Contents
Fetching ...

Proactive Conversational Agents with Inner Thoughts

Xingyu Bruce Liu, Shitao Fang, Weiyan Shi, Chien-Sheng Wu, Takeo Igarashi, Xiang Anthony Chen

TL;DR

The paper tackles the challenge of making AI agents proactively engage in multi-party conversations by moving beyond next-speaker prediction to an Internal Thoughts framework. It combines formative human studies with a dual-process thinking model to generate covert thoughts, evaluate their intrinsic motivation, and determine timely participation. Across simulational and real-user evaluations, Inner Thoughts outperforms next-speaker baselines in coherence, turn-taking, engagement, and perceived intelligence, while users preferred its socially adept behavior. The work contributes a novel cognitive-inspired architecture, empirical heuristics for intrinsic motivation, and two open-source implementations (a playground and Swimmy Slackbot) that enable scalable exploration of proactive, human-like AI in social settings.

Abstract

One of the long-standing aspirations in conversational AI is to allow them to autonomously take initiatives in conversations, i.e., being proactive. This is especially challenging for multi-party conversations. Prior NLP research focused mainly on predicting the next speaker from contexts like preceding conversations. In this paper, we demonstrate the limitations of such methods and rethink what it means for AI to be proactive in multi-party, human-AI conversations. We propose that just like humans, rather than merely reacting to turn-taking cues, a proactive AI formulates its own inner thoughts during a conversation, and seeks the right moment to contribute. Through a formative study with 24 participants and inspiration from linguistics and cognitive psychology, we introduce the Inner Thoughts framework. Our framework equips AI with a continuous, covert train of thoughts in parallel to the overt communication process, which enables it to proactively engage by modeling its intrinsic motivation to express these thoughts. We instantiated this framework into two real-time systems: an AI playground web app and a chatbot. Through a technical evaluation and user studies with human participants, our framework significantly surpasses existing baselines on aspects like anthropomorphism, coherence, intelligence, and turn-taking appropriateness.

Proactive Conversational Agents with Inner Thoughts

TL;DR

The paper tackles the challenge of making AI agents proactively engage in multi-party conversations by moving beyond next-speaker prediction to an Internal Thoughts framework. It combines formative human studies with a dual-process thinking model to generate covert thoughts, evaluate their intrinsic motivation, and determine timely participation. Across simulational and real-user evaluations, Inner Thoughts outperforms next-speaker baselines in coherence, turn-taking, engagement, and perceived intelligence, while users preferred its socially adept behavior. The work contributes a novel cognitive-inspired architecture, empirical heuristics for intrinsic motivation, and two open-source implementations (a playground and Swimmy Slackbot) that enable scalable exploration of proactive, human-like AI in social settings.

Abstract

One of the long-standing aspirations in conversational AI is to allow them to autonomously take initiatives in conversations, i.e., being proactive. This is especially challenging for multi-party conversations. Prior NLP research focused mainly on predicting the next speaker from contexts like preceding conversations. In this paper, we demonstrate the limitations of such methods and rethink what it means for AI to be proactive in multi-party, human-AI conversations. We propose that just like humans, rather than merely reacting to turn-taking cues, a proactive AI formulates its own inner thoughts during a conversation, and seeks the right moment to contribute. Through a formative study with 24 participants and inspiration from linguistics and cognitive psychology, we introduce the Inner Thoughts framework. Our framework equips AI with a continuous, covert train of thoughts in parallel to the overt communication process, which enables it to proactively engage by modeling its intrinsic motivation to express these thoughts. We instantiated this framework into two real-time systems: an AI playground web app and a chatbot. Through a technical evaluation and user studies with human participants, our framework significantly surpasses existing baselines on aspects like anthropomorphism, coherence, intelligence, and turn-taking appropriateness.
Paper Structure (61 sections, 2 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 61 sections, 2 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: People's intrinsic motivation to engage in conversations: Heuristics of what factors influence people's decisions to express or withhold their thoughts during conversations, derived from our think-aloud study. Each heuristic contains two example mid-level themes from our codebook.
  • Figure 2: The Inner Thoughts framework for AI proactive engagement in conversations. A conversational event triggers the retrieval of relevant memories from long-term memory and thought reservoir. New thoughts are then formed based on these activated memories, and added to the thought reservoir. These thoughts are evaluated for AI's intrinsic motivation (score = 4.1 in the figure) to express. AI participates by articulating a thought at a selected moment in the ongoing conversation.
  • Figure 3: Prompt structure for evaluating intrinsic motivation of a thought. The evaluator rates the AI's intrinsic motivation to engage using a 1-5 scale based on heuristics like relevance and coherence. A Chain-of-Thoughts (CoT) process evaluates both positive and negative factors, resulting in a weighted score.
  • Figure 4: Examples selected from simulation logs of AI turn-taking behaviors in the Inner Thoughts framework. The figure illustrates four behaviors: Participation by Motivation, where the AI joins the conversation by sharing relevant personal experience; Interruption, where the AI interjects with a strong contribution during an ongoing discussion; Retention, where the AI holds back a thought until it's contextually relevant; and Thought Evolution, where the AI adapts its responses as the conversation progresses.
  • Figure 5: The Inner Thoughts playground web app interface. Multiple AI and humans can be added to simulate a group conversation. Users can also view and edit each of the participants' long-term memory and thoughts.
  • ...and 3 more figures