Towards Anthropomorphic Conversational AI Part I: A Practical Framework
Fei Wei, Yaliang Li, Bolin Ding
TL;DR
This work tackles the challenge of imbuing conversational AI with anthropomorphic social and cognitive capabilities beyond what a single large language model call can offer. It proposes a practical, multi-module framework comprising memory management, awareness modules, and reflexive/analytic response generators to emulate human thinking and conversational dynamics. The two-stage plan prioritizes Stage 1: developing the framework and validating it with real-user interactions, showing notable improvements in social and conversational intelligence without LLM fine-tuning. The findings lay groundwork for Stage 2, which would employ human labeling and reinforcement learning to further align AI behavior with human preferences and capabilities, enabling more natural and engaging interactions in real-world applications.
Abstract
Large language models (LLMs), due to their advanced natural language capabilities, have seen significant success in applications where the user interface is usually a conversational artificial intelligence (AI) agent and engages the user through multi-round conversations. However, many scenarios require the agents to exhibit stronger social and conversational intelligence and demonstrate more human-like (anthropomorphic) reactions. This is an aspect that foundational LLMs have yet to fully address such that a single call of foundational models might be insufficient. To bridge this gap, we propose a two-stage solution. In this work, we focus on the first stage, introducing a multi-module framework designed to replicate the key aspects of human intelligence involved in conversations. This framework comprises thinking modules for reasoning, resource modules for managing knowledge and external information, and response modules for generating contextually appropriate interactions. With all the modules cooperating, the framework would empower the agents to provide a better human-like conversation experience. In the second stage of our approach, these conversational data, after filtering and labeling, can serve as training and testing data for reinforcement learning, enabling AI to better capture human preferences. This stage is left for future work. In our experiments, volunteers engaged in over 3000 rounds of conversation with the same AI character powered by a standalone LLM and our framework which integrates the same LLM. A separate group of evaluators rated the conversation samples, revealing that our framework significantly enhanced the social and conversational intelligence, even without fine-tuning the LLM.
