Continuous Learning Conversational AI: A Personalized Agent Framework via A2C Reinforcement Learning
Nandakishor M, Anjali M
TL;DR
The paper tackles the challenge of personalization in conversational AI by moving beyond static LLMs to Continuous Learning Conversational AI (CLCA) implemented with Advantage Actor-Critic (A2C) reinforcement learning. It leverages LLM-generated synthetic sales dialogues to train an A2C agent within a Gymnasium-based environment (SalesEnv), where the state combines dialogue embeddings and a short-term history, and the action space comprises four continuous metrics controlling the forthcoming responses. Key contributions include a CompanyProfile-driven synthetic data pipeline, the RL environment design with a defined reward structure, and the A2C-guided response-selection mechanism that combines RL strategy with LLM fluency. The approach enables continuous, personalized dialogue strategies with practical implications for evolving AI companions in sales and beyond.
Abstract
Creating personalized and adaptable conversational AI remains a key challenge. This paper introduces a Continuous Learning Conversational AI (CLCA) approach, implemented using A2C reinforcement learning, to move beyond static Large Language Models (LLMs). We use simulated sales dialogues, generated by LLMs, to train an A2C agent. This agent learns to optimize conversation strategies for personalization, focusing on engagement and delivering value. Our system architecture integrates reinforcement learning with LLMs for both data creation and response selection. This method offers a practical way to build personalized AI companions that evolve through continuous learning, advancing beyond traditional static LLM techniques.
