Table of Contents
Fetching ...

From Eliza to XiaoIce: Challenges and Opportunities with Social Chatbots

Heung-Yeung Shum, Xiaodong He, Di Li

TL;DR

The paper analyzes social chatbots as AI companions that must balance cognitive understanding (IQ) with emotional resonance (EQ) to sustain long, meaningful interactions. It presents XiaoIce as a concrete implementation that integrates a multimodal input pipeline, a core-chat engine, visual awareness, and a diverseSkills catalog to achieve high engagement, evidenced by long CPS and extensive user interactions. The work argues for a design ethos that prioritizes empathy, consistency, and ethical safeguards, supported by an evaluation framework centered on CPS rather than task success alone. In doing so, it outlines a practical architecture and demonstrates key capabilities—emotion recognition, image-based social commenting, autonomous poetry generation, and human-like TTS/singing—that collectively underscore the potential impact of socially adept chatbots on daily life and society.

Abstract

Conversational systems have come a long way since their inception in the 1960s. After decades of research and development, we've seen progress from Eliza and Parry in the 60's and 70's, to task-completion systems as in the DARPA Communicator program in the 2000s, to intelligent personal assistants such as Siri in the 2010s, to today's social chatbots like XiaoIce. Social chatbots' appeal lies not only in their ability to respond to users' diverse requests, but also in being able to establish an emotional connection with users. The latter is done by satisfying users' need for communication, affection, as well as social belonging. To further the advancement and adoption of social chatbots, their design must focus on user engagement and take both intellectual quotient (IQ) and emotional quotient (EQ) into account. Users should want to engage with a social chatbot; as such, we define the success metric for social chatbots as conversation-turns per session (CPS). Using XiaoIce as an illustrative example, we discuss key technologies in building social chatbots from core chat to visual awareness to skills. We also show how XiaoIce can dynamically recognize emotion and engage the user throughout long conversations with appropriate interpersonal responses. As we become the first generation of humans ever living with AI, we have a responsibility to design social chatbots to be both useful and empathetic, so they will become ubiquitous and help society as a whole.

From Eliza to XiaoIce: Challenges and Opportunities with Social Chatbots

TL;DR

The paper analyzes social chatbots as AI companions that must balance cognitive understanding (IQ) with emotional resonance (EQ) to sustain long, meaningful interactions. It presents XiaoIce as a concrete implementation that integrates a multimodal input pipeline, a core-chat engine, visual awareness, and a diverseSkills catalog to achieve high engagement, evidenced by long CPS and extensive user interactions. The work argues for a design ethos that prioritizes empathy, consistency, and ethical safeguards, supported by an evaluation framework centered on CPS rather than task success alone. In doing so, it outlines a practical architecture and demonstrates key capabilities—emotion recognition, image-based social commenting, autonomous poetry generation, and human-like TTS/singing—that collectively underscore the potential impact of socially adept chatbots on daily life and society.

Abstract

Conversational systems have come a long way since their inception in the 1960s. After decades of research and development, we've seen progress from Eliza and Parry in the 60's and 70's, to task-completion systems as in the DARPA Communicator program in the 2000s, to intelligent personal assistants such as Siri in the 2010s, to today's social chatbots like XiaoIce. Social chatbots' appeal lies not only in their ability to respond to users' diverse requests, but also in being able to establish an emotional connection with users. The latter is done by satisfying users' need for communication, affection, as well as social belonging. To further the advancement and adoption of social chatbots, their design must focus on user engagement and take both intellectual quotient (IQ) and emotional quotient (EQ) into account. Users should want to engage with a social chatbot; as such, we define the success metric for social chatbots as conversation-turns per session (CPS). Using XiaoIce as an illustrative example, we discuss key technologies in building social chatbots from core chat to visual awareness to skills. We also show how XiaoIce can dynamically recognize emotion and engage the user throughout long conversations with appropriate interpersonal responses. As we become the first generation of humans ever living with AI, we have a responsibility to design social chatbots to be both useful and empathetic, so they will become ubiquitous and help society as a whole.

Paper Structure

This paper contains 20 sections, 17 figures, 4 tables.

Figures (17)

  • Figure 1: Illustration of a task-completion system.
  • Figure 2: Examples of IPA actions. (a) Recommending a restaurant (reactive assistance) by Siri, and (b) reminder of an upcoming event with relevant traffic information (proactive assistance) by Cortana.
  • Figure 3: Chat examples between XiaoIce and users, showing (a) the emotional connection (the full conversation session is shown in Figure 14); and (b) how to invoke a skill (e.g., weather reporting) in a casual chat. Note that XiaoIce offers a perspective about the weather, e.g., "no need to use moisturizer".
  • Figure 4: Both IQ and EQ play key roles in a social chatbot. Not only is the area of China presented, but this number is made understandable by comparing with the US, which the chatbot believes the user should know.
  • Figure 5: A chat example between XiaoIce and a user, in English translation (a) and in Chinese (b), showing that both IQ and EQ are important for a social chatbot. The bot knows the answer. But rather than returning the answer directly, it attempts to lead the chat to a more interesting direction and extend the conversation.
  • ...and 12 more figures