Table of Contents
Fetching ...

Continuous Learning Conversational AI: A Personalized Agent Framework via A2C Reinforcement Learning

Nandakishor M, Anjali M

TL;DR

The paper tackles the challenge of personalization in conversational AI by moving beyond static LLMs to Continuous Learning Conversational AI (CLCA) implemented with Advantage Actor-Critic (A2C) reinforcement learning. It leverages LLM-generated synthetic sales dialogues to train an A2C agent within a Gymnasium-based environment (SalesEnv), where the state combines dialogue embeddings and a short-term history, and the action space comprises four continuous metrics controlling the forthcoming responses. Key contributions include a CompanyProfile-driven synthetic data pipeline, the RL environment design with a defined reward structure, and the A2C-guided response-selection mechanism that combines RL strategy with LLM fluency. The approach enables continuous, personalized dialogue strategies with practical implications for evolving AI companions in sales and beyond.

Abstract

Creating personalized and adaptable conversational AI remains a key challenge. This paper introduces a Continuous Learning Conversational AI (CLCA) approach, implemented using A2C reinforcement learning, to move beyond static Large Language Models (LLMs). We use simulated sales dialogues, generated by LLMs, to train an A2C agent. This agent learns to optimize conversation strategies for personalization, focusing on engagement and delivering value. Our system architecture integrates reinforcement learning with LLMs for both data creation and response selection. This method offers a practical way to build personalized AI companions that evolve through continuous learning, advancing beyond traditional static LLM techniques.

Continuous Learning Conversational AI: A Personalized Agent Framework via A2C Reinforcement Learning

TL;DR

The paper tackles the challenge of personalization in conversational AI by moving beyond static LLMs to Continuous Learning Conversational AI (CLCA) implemented with Advantage Actor-Critic (A2C) reinforcement learning. It leverages LLM-generated synthetic sales dialogues to train an A2C agent within a Gymnasium-based environment (SalesEnv), where the state combines dialogue embeddings and a short-term history, and the action space comprises four continuous metrics controlling the forthcoming responses. Key contributions include a CompanyProfile-driven synthetic data pipeline, the RL environment design with a defined reward structure, and the A2C-guided response-selection mechanism that combines RL strategy with LLM fluency. The approach enables continuous, personalized dialogue strategies with practical implications for evolving AI companions in sales and beyond.

Abstract

Creating personalized and adaptable conversational AI remains a key challenge. This paper introduces a Continuous Learning Conversational AI (CLCA) approach, implemented using A2C reinforcement learning, to move beyond static Large Language Models (LLMs). We use simulated sales dialogues, generated by LLMs, to train an A2C agent. This agent learns to optimize conversation strategies for personalization, focusing on engagement and delivering value. Our system architecture integrates reinforcement learning with LLMs for both data creation and response selection. This method offers a practical way to build personalized AI companions that evolve through continuous learning, advancing beyond traditional static LLM techniques.

Paper Structure

This paper contains 13 sections, 3 equations, 4 algorithms.