Table of Contents
Fetching ...

Towards Anthropomorphic Conversational AI Part I: A Practical Framework

Fei Wei, Yaliang Li, Bolin Ding

TL;DR

This work tackles the challenge of imbuing conversational AI with anthropomorphic social and cognitive capabilities beyond what a single large language model call can offer. It proposes a practical, multi-module framework comprising memory management, awareness modules, and reflexive/analytic response generators to emulate human thinking and conversational dynamics. The two-stage plan prioritizes Stage 1: developing the framework and validating it with real-user interactions, showing notable improvements in social and conversational intelligence without LLM fine-tuning. The findings lay groundwork for Stage 2, which would employ human labeling and reinforcement learning to further align AI behavior with human preferences and capabilities, enabling more natural and engaging interactions in real-world applications.

Abstract

Large language models (LLMs), due to their advanced natural language capabilities, have seen significant success in applications where the user interface is usually a conversational artificial intelligence (AI) agent and engages the user through multi-round conversations. However, many scenarios require the agents to exhibit stronger social and conversational intelligence and demonstrate more human-like (anthropomorphic) reactions. This is an aspect that foundational LLMs have yet to fully address such that a single call of foundational models might be insufficient. To bridge this gap, we propose a two-stage solution. In this work, we focus on the first stage, introducing a multi-module framework designed to replicate the key aspects of human intelligence involved in conversations. This framework comprises thinking modules for reasoning, resource modules for managing knowledge and external information, and response modules for generating contextually appropriate interactions. With all the modules cooperating, the framework would empower the agents to provide a better human-like conversation experience. In the second stage of our approach, these conversational data, after filtering and labeling, can serve as training and testing data for reinforcement learning, enabling AI to better capture human preferences. This stage is left for future work. In our experiments, volunteers engaged in over 3000 rounds of conversation with the same AI character powered by a standalone LLM and our framework which integrates the same LLM. A separate group of evaluators rated the conversation samples, revealing that our framework significantly enhanced the social and conversational intelligence, even without fine-tuning the LLM.

Towards Anthropomorphic Conversational AI Part I: A Practical Framework

TL;DR

This work tackles the challenge of imbuing conversational AI with anthropomorphic social and cognitive capabilities beyond what a single large language model call can offer. It proposes a practical, multi-module framework comprising memory management, awareness modules, and reflexive/analytic response generators to emulate human thinking and conversational dynamics. The two-stage plan prioritizes Stage 1: developing the framework and validating it with real-user interactions, showing notable improvements in social and conversational intelligence without LLM fine-tuning. The findings lay groundwork for Stage 2, which would employ human labeling and reinforcement learning to further align AI behavior with human preferences and capabilities, enabling more natural and engaging interactions in real-world applications.

Abstract

Large language models (LLMs), due to their advanced natural language capabilities, have seen significant success in applications where the user interface is usually a conversational artificial intelligence (AI) agent and engages the user through multi-round conversations. However, many scenarios require the agents to exhibit stronger social and conversational intelligence and demonstrate more human-like (anthropomorphic) reactions. This is an aspect that foundational LLMs have yet to fully address such that a single call of foundational models might be insufficient. To bridge this gap, we propose a two-stage solution. In this work, we focus on the first stage, introducing a multi-module framework designed to replicate the key aspects of human intelligence involved in conversations. This framework comprises thinking modules for reasoning, resource modules for managing knowledge and external information, and response modules for generating contextually appropriate interactions. With all the modules cooperating, the framework would empower the agents to provide a better human-like conversation experience. In the second stage of our approach, these conversational data, after filtering and labeling, can serve as training and testing data for reinforcement learning, enabling AI to better capture human preferences. This stage is left for future work. In our experiments, volunteers engaged in over 3000 rounds of conversation with the same AI character powered by a standalone LLM and our framework which integrates the same LLM. A separate group of evaluators rated the conversation samples, revealing that our framework significantly enhanced the social and conversational intelligence, even without fine-tuning the LLM.

Paper Structure

This paper contains 18 sections, 3 figures.

Figures (3)

  • Figure 1: The workflow of the framework. Here, the awareness manager is called after agent complete all the outputs and focus on the self-awareness (opinion, feeling and emotion), while emotion manager focuses on the control of emotion in respond to the user input. The conversation manager focuses on the strategical analysis of the current topic or task of the conversation. The quick response generator is designed to provide quick response for simple user input, or provide buffer time for the agent to process complex user input. Note that the analytical response generator is involved in a loop - it would analyze the existing response messages, and decide whether to continue to output messages, or conclude the current turn.
  • Figure 2: Conversational Intelligence
  • Figure 3: Social Intelligence