Table of Contents
Fetching ...

Affective Multimodal Agents with Proactive Knowledge Grounding for Emotionally Aligned Marketing Dialogue

Lin Yu, Xiaofei Han, Yifei Kang, Chiung-Yi Tseng, Danyang Zhang, Ziqian Bi, Zhimo Han

TL;DR

The paper tackles the limitations of reactive, text-only marketing agents by introducing AffectMind, a multimodal dialogue system with proactive reasoning and real-time knowledge grounding. It combines three innovations—Proactive Knowledge Grounding Network (PKGN), Emotion-Intent Alignment Model (EIAM), and Reinforced Discourse Loop (RDL)—to maintain emotional coherence, adapt persuasion strategies, and optimize long-term engagement. Two new datasets, MM-ConvMarket and AffectPromo, enable rigorous evaluation, and AffectMind demonstrates substantial gains in emotional consistency (+26%), persuasive success (+19%), and user engagement (+23%) compared with strong baselines, with statistically significant improvements and insights from ablations and qualitative analysis. The work also addresses ethical considerations like transparency, user autonomy, privacy, and fairness, and outlines practical future directions, including efficiency, cross-cultural generalization, long-term relationship modeling, and multi-party conversations.

Abstract

Recent advances in large language models (LLMs) have enabled fluent dialogue systems, but most remain reactive and struggle in emotionally rich, goal-oriented settings such as marketing conversations. To address this limitation, we propose AffectMind, a multimodal affective dialogue agent that performs proactive reasoning and dynamic knowledge grounding to sustain emotionally aligned and persuasive interactions. AffectMind combines three components: a Proactive Knowledge Grounding Network (PKGN) that continuously updates factual and affective context from text, vision, and prosody; an Emotion--Intent Alignment Model (EIAM) that jointly models user emotion and purchase intent to adapt persuasion strategies; and a Reinforced Discourse Loop (RDL) that optimizes emotional coherence and engagement via reinforcement signals from user responses. Experiments on two newly curated marketing dialogue datasets, MM-ConvMarket and AffectPromo, show that AffectMind outperforms strong LLM-based baselines in emotional consistency (+26\%), persuasive success rate (+19\%), and long-term user engagement (+23\%), highlighting emotion-grounded proactivity as a key capability for commercial multimodal agents.

Affective Multimodal Agents with Proactive Knowledge Grounding for Emotionally Aligned Marketing Dialogue

TL;DR

The paper tackles the limitations of reactive, text-only marketing agents by introducing AffectMind, a multimodal dialogue system with proactive reasoning and real-time knowledge grounding. It combines three innovations—Proactive Knowledge Grounding Network (PKGN), Emotion-Intent Alignment Model (EIAM), and Reinforced Discourse Loop (RDL)—to maintain emotional coherence, adapt persuasion strategies, and optimize long-term engagement. Two new datasets, MM-ConvMarket and AffectPromo, enable rigorous evaluation, and AffectMind demonstrates substantial gains in emotional consistency (+26%), persuasive success (+19%), and user engagement (+23%) compared with strong baselines, with statistically significant improvements and insights from ablations and qualitative analysis. The work also addresses ethical considerations like transparency, user autonomy, privacy, and fairness, and outlines practical future directions, including efficiency, cross-cultural generalization, long-term relationship modeling, and multi-party conversations.

Abstract

Recent advances in large language models (LLMs) have enabled fluent dialogue systems, but most remain reactive and struggle in emotionally rich, goal-oriented settings such as marketing conversations. To address this limitation, we propose AffectMind, a multimodal affective dialogue agent that performs proactive reasoning and dynamic knowledge grounding to sustain emotionally aligned and persuasive interactions. AffectMind combines three components: a Proactive Knowledge Grounding Network (PKGN) that continuously updates factual and affective context from text, vision, and prosody; an Emotion--Intent Alignment Model (EIAM) that jointly models user emotion and purchase intent to adapt persuasion strategies; and a Reinforced Discourse Loop (RDL) that optimizes emotional coherence and engagement via reinforcement signals from user responses. Experiments on two newly curated marketing dialogue datasets, MM-ConvMarket and AffectPromo, show that AffectMind outperforms strong LLM-based baselines in emotional consistency (+26\%), persuasive success rate (+19\%), and long-term user engagement (+23\%), highlighting emotion-grounded proactivity as a key capability for commercial multimodal agents.

Paper Structure

This paper contains 49 sections, 14 equations, 5 figures, 7 tables, 1 algorithm.

Figures (5)

  • Figure 1: AffectMind: A proactive marketing dialogue system that senses user signals, aligns with emotion and intent, and generates empathetic, persuasive responses.
  • Figure 2: Architecture of the AffectMind algorithm, showcasing multimodal fusion, proactive knowledge generation, emotion-intent alignment, and a reinforced discourse loop for adaptive marketing dialogues.
  • Figure 3: Performance Comparison on Marketing Dialogue Tasks
  • Figure 4: Ablation study results showing the impact of removing individual components (PKGN, EIAM, RDL) and design choices (multimodal input, dynamic knowledge) on emotional consistency, persuasive success, and user engagement.
  • Figure 5: Performance Degradation Across Dialogue Turns