Affective Multimodal Agents with Proactive Knowledge Grounding for Emotionally Aligned Marketing Dialogue
Lin Yu, Xiaofei Han, Yifei Kang, Chiung-Yi Tseng, Danyang Zhang, Ziqian Bi, Zhimo Han
TL;DR
The paper tackles the limitations of reactive, text-only marketing agents by introducing AffectMind, a multimodal dialogue system with proactive reasoning and real-time knowledge grounding. It combines three innovations—Proactive Knowledge Grounding Network (PKGN), Emotion-Intent Alignment Model (EIAM), and Reinforced Discourse Loop (RDL)—to maintain emotional coherence, adapt persuasion strategies, and optimize long-term engagement. Two new datasets, MM-ConvMarket and AffectPromo, enable rigorous evaluation, and AffectMind demonstrates substantial gains in emotional consistency (+26%), persuasive success (+19%), and user engagement (+23%) compared with strong baselines, with statistically significant improvements and insights from ablations and qualitative analysis. The work also addresses ethical considerations like transparency, user autonomy, privacy, and fairness, and outlines practical future directions, including efficiency, cross-cultural generalization, long-term relationship modeling, and multi-party conversations.
Abstract
Recent advances in large language models (LLMs) have enabled fluent dialogue systems, but most remain reactive and struggle in emotionally rich, goal-oriented settings such as marketing conversations. To address this limitation, we propose AffectMind, a multimodal affective dialogue agent that performs proactive reasoning and dynamic knowledge grounding to sustain emotionally aligned and persuasive interactions. AffectMind combines three components: a Proactive Knowledge Grounding Network (PKGN) that continuously updates factual and affective context from text, vision, and prosody; an Emotion--Intent Alignment Model (EIAM) that jointly models user emotion and purchase intent to adapt persuasion strategies; and a Reinforced Discourse Loop (RDL) that optimizes emotional coherence and engagement via reinforcement signals from user responses. Experiments on two newly curated marketing dialogue datasets, MM-ConvMarket and AffectPromo, show that AffectMind outperforms strong LLM-based baselines in emotional consistency (+26\%), persuasive success rate (+19\%), and long-term user engagement (+23\%), highlighting emotion-grounded proactivity as a key capability for commercial multimodal agents.
