Table of Contents
Fetching ...

Safe Guard: an LLM-agent for Real-time Voice-based Hate Speech Detection in Social Virtual Reality

Yiwen Xu, Qinyang Hou, Hongyu Wan, Mirjana Prpa

TL;DR

Safe Guard tackles real-time voice-based hate speech detection in social VR by fusing GPT-3.5-based text analysis with a CNN that processes audio features (RMS and 40-dimensional MFCCs). The embodied agent operates in conversational and observational modes within VRChat, employing prompting strategies (direct, definition-based, and few-shot) and a strict multi-modal decision rule requiring both modalities to indicate hate for action, thereby reducing false positives. Using the HateMM dataset and VRChat experiments, the study shows that the combined approach achieves high precision (≈0.96) and substantially lowers false positives (FPR ≈0.01), with an average latency around 1.5 seconds, highlighting the practicality of real-time, multimodal moderation in immersive environments. This work demonstrates the viability and value of integrating LLM-driven moderation with audio cues to enhance safety in social VR and provides a foundation for future multimodal safety systems.

Abstract

In this paper, we present Safe Guard, an LLM-agent for the detection of hate speech in voice-based interactions in social VR (VRChat). Our system leverages Open AI GPT and audio feature extraction for real-time voice interactions. We contribute a system design and evaluation of the system that demonstrates the capability of our approach in detecting hate speech, and reducing false positives compared to currently available approaches. Our results indicate the potential of LLM-based agents in creating safer virtual environments and set the groundwork for further advancements in LLM-driven moderation approaches.

Safe Guard: an LLM-agent for Real-time Voice-based Hate Speech Detection in Social Virtual Reality

TL;DR

Safe Guard tackles real-time voice-based hate speech detection in social VR by fusing GPT-3.5-based text analysis with a CNN that processes audio features (RMS and 40-dimensional MFCCs). The embodied agent operates in conversational and observational modes within VRChat, employing prompting strategies (direct, definition-based, and few-shot) and a strict multi-modal decision rule requiring both modalities to indicate hate for action, thereby reducing false positives. Using the HateMM dataset and VRChat experiments, the study shows that the combined approach achieves high precision (≈0.96) and substantially lowers false positives (FPR ≈0.01), with an average latency around 1.5 seconds, highlighting the practicality of real-time, multimodal moderation in immersive environments. This work demonstrates the viability and value of integrating LLM-driven moderation with audio cues to enhance safety in social VR and provides a foundation for future multimodal safety systems.

Abstract

In this paper, we present Safe Guard, an LLM-agent for the detection of hate speech in voice-based interactions in social VR (VRChat). Our system leverages Open AI GPT and audio feature extraction for real-time voice interactions. We contribute a system design and evaluation of the system that demonstrates the capability of our approach in detecting hate speech, and reducing false positives compared to currently available approaches. Our results indicate the potential of LLM-based agents in creating safer virtual environments and set the groundwork for further advancements in LLM-driven moderation approaches.
Paper Structure (36 sections, 5 equations, 5 figures, 3 tables)