Table of Contents
Fetching ...

Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations

Lichao Zhang, Jia Yu, Shuai Zhang, Long Li, Yangyang Zhong, Guanbao Liang, Yuming Yan, Qing Ma, Fangsheng Weng, Fayu Pan, Jing Li, Renjun Xu, Zhenzhong Lan

TL;DR

This study underscores the importance of multi-modality in chatbot design, offering valuable insights for creating more engaging and immersive AI communication experiences and informing the broader AI community about the benefits of multi-modal interactions in enhancing user engagement.

Abstract

Large Language Models (LLMs) have significantly advanced user-bot interactions, enabling more complex and coherent dialogues. However, the prevalent text-only modality might not fully exploit the potential for effective user engagement. This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement in chatbot conversations. We conduct a comprehensive analysis using a diverse set of chatbots and real-user interaction data, employing metrics such as retention rate and conversation length to evaluate user engagement. Our findings reveal a significant enhancement in user engagement with multi-modal interactions compared to text-only dialogues. Notably, the incorporation of a third modality significantly amplifies engagement beyond the benefits observed with just two modalities. These results suggest that multi-modal interactions optimize cognitive processing and facilitate richer information comprehension. This study underscores the importance of multi-modality in chatbot design, offering valuable insights for creating more engaging and immersive AI communication experiences and informing the broader AI community about the benefits of multi-modal interactions in enhancing user engagement.

Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations

TL;DR

This study underscores the importance of multi-modality in chatbot design, offering valuable insights for creating more engaging and immersive AI communication experiences and informing the broader AI community about the benefits of multi-modal interactions in enhancing user engagement.

Abstract

Large Language Models (LLMs) have significantly advanced user-bot interactions, enabling more complex and coherent dialogues. However, the prevalent text-only modality might not fully exploit the potential for effective user engagement. This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement in chatbot conversations. We conduct a comprehensive analysis using a diverse set of chatbots and real-user interaction data, employing metrics such as retention rate and conversation length to evaluate user engagement. Our findings reveal a significant enhancement in user engagement with multi-modal interactions compared to text-only dialogues. Notably, the incorporation of a third modality significantly amplifies engagement beyond the benefits observed with just two modalities. These results suggest that multi-modal interactions optimize cognitive processing and facilitate richer information comprehension. This study underscores the importance of multi-modality in chatbot design, offering valuable insights for creating more engaging and immersive AI communication experiences and informing the broader AI community about the benefits of multi-modal interactions in enhancing user engagement.
Paper Structure (30 sections, 1 equation, 11 figures, 5 tables)

This paper contains 30 sections, 1 equation, 11 figures, 5 tables.

Figures (11)

  • Figure 1: The overview of our comprehensive benchmark evaluating user engagement during multi-modal conversation interaction. The top row shows conversation interaction modalities, including three exemplar individual modalities (text, image and audio) conversations and an exemplar multi-modal conversation; The second row presents the conversation interaction factors derived from all four types of modalities conversations. Finally, we use three metrics to measure the user engagement with respect to the aforementioned conversation interaction factors.
  • Figure 2: Proportions of different types of characters and users. The top left illustrates the proportions of different character types, the top right shows the age distribution of users, the bottom left indicates the gender distribution among users, and the bottom right displays the geographic distribution of users.
  • Figure 3: Comparison of the influence of different image styles on three user engagement measures. Each subfigure compares the distribution of each measure across all image styles.
  • Figure 4: Comparison of the influence of different audio styles on three user engagement measures. Each subfigure compares the distribution of each measure across all audio styles.
  • Figure 5: Single-modal correlation trends. The x-axis represents quantified condition values, while the y-axis shows Retention, CL, and UUL results. A rise in condition values corresponds to improved user engagement, highlighting a clear positive correlation.
  • ...and 6 more figures