Table of Contents
Fetching ...

SocialMind: LLM-based Proactive AR Social Assistive System with Human-like Perception for In-situ Live Interactions

Bufang Yang, Yunqi Guo, Lilin Xu, Zhenyu Yan, Hongkai Chen, Guoliang Xing, Xiaofan Jiang

TL;DR

SocialMind presents a first-of-its-kind proactive AR social assistive system that perceives live social cues through multi-modal sensors, including nonverbal signals and implicit personas, and uses a multi-tier collaboration strategy to generate in-situ social suggestions. It combines a social factor-aware cache with intention inference and concise chain-of-thought reasoning to deliver timely guidance on AR glasses without interrupting conversation. Evaluations on three public datasets and a 20-participant real-world study show substantial gains in engagement and a high willingness to use, underscoring the practical potential of AR-based, LLM-driven social assistance in real interactions. The work advances live, context-aware social coaching and offers a concrete architecture for scalable, privacy-conscious, proactive social support.

Abstract

Social interactions are fundamental to human life. The recent emergence of large language models (LLMs)-based virtual assistants has demonstrated their potential to revolutionize human interactions and lifestyles. However, existing assistive systems mainly provide reactive services to individual users, rather than offering in-situ assistance during live social interactions with conversational partners. In this study, we introduce SocialMind, the first LLM-based proactive AR social assistive system that provides users with in-situ social assistance. SocialMind employs human-like perception leveraging multi-modal sensors to extract both verbal and nonverbal cues, social factors, and implicit personas, incorporating these social cues into LLM reasoning for social suggestion generation. Additionally, SocialMind employs a multi-tier collaborative generation strategy and proactive update mechanism to display social suggestions on Augmented Reality (AR) glasses, ensuring that suggestions are timely provided to users without disrupting the natural flow of conversation. Evaluations on three public datasets and a user study with 20 participants show that SocialMind achieves 38.3% higher engagement compared to baselines, and 95% of participants are willing to use SocialMind in their live social interactions.

SocialMind: LLM-based Proactive AR Social Assistive System with Human-like Perception for In-situ Live Interactions

TL;DR

SocialMind presents a first-of-its-kind proactive AR social assistive system that perceives live social cues through multi-modal sensors, including nonverbal signals and implicit personas, and uses a multi-tier collaboration strategy to generate in-situ social suggestions. It combines a social factor-aware cache with intention inference and concise chain-of-thought reasoning to deliver timely guidance on AR glasses without interrupting conversation. Evaluations on three public datasets and a 20-participant real-world study show substantial gains in engagement and a high willingness to use, underscoring the practical potential of AR-based, LLM-driven social assistance in real interactions. The work advances live, context-aware social coaching and offers a concrete architecture for scalable, privacy-conscious, proactive social support.

Abstract

Social interactions are fundamental to human life. The recent emergence of large language models (LLMs)-based virtual assistants has demonstrated their potential to revolutionize human interactions and lifestyles. However, existing assistive systems mainly provide reactive services to individual users, rather than offering in-situ assistance during live social interactions with conversational partners. In this study, we introduce SocialMind, the first LLM-based proactive AR social assistive system that provides users with in-situ social assistance. SocialMind employs human-like perception leveraging multi-modal sensors to extract both verbal and nonverbal cues, social factors, and implicit personas, incorporating these social cues into LLM reasoning for social suggestion generation. Additionally, SocialMind employs a multi-tier collaborative generation strategy and proactive update mechanism to display social suggestions on Augmented Reality (AR) glasses, ensuring that suggestions are timely provided to users without disrupting the natural flow of conversation. Evaluations on three public datasets and a user study with 20 participants show that SocialMind achieves 38.3% higher engagement compared to baselines, and 95% of participants are willing to use SocialMind in their live social interactions.

Paper Structure

This paper contains 50 sections, 28 figures, 4 tables.

Figures (28)

  • Figure 1: Overview of SocialMind. SocialMind provides in-situ social assistance to the user to help the user during live social interactions with conversational partners. SocialMind automatically performs human-like social perception, generates social suggestions to assist users, and proactively displays them on the user's AR glasses as the conversation proceeds. Users can seamlessly refer to these suggestions while interacting with conversational partners.
  • Figure 2: Survey results for social experience and assistance demand.
  • Figure 3: System overview of SocialMind. SocialMind leverages the multi-modal sensor data to achieve human-like perception. The extracted verbal and nonverbal cues, social factors, and implicit personas are integrated into LLMs, generating in-situ social suggestions with points and examples displayed on the user's AR glasses.
  • Figure 4: SocialMind's primary user detection.
  • Figure 5: Implicit persona adaptation in SocialMind.
  • ...and 23 more figures