SpeechCap: Leveraging Playful Impact Captions to Facilitate Interpersonal Communication in Social Virtual Reality
Yu Zhang, Yi Wen, Siying Hu, Zhicong Lu
TL;DR
This work tackles the limited expressiveness of interpersonal communication in social VR by introducing SpeechCap, a real-time system that converts speech into interactive impact captions combining verbal content with non-verbal cues. It defines a design space for impact captions through TV-variety-show analysis and expert co-design, then validates the approach with a proof-of-concept implementation and an in-lab study (n=14) showing that captions can clarify and enrich conversations while enabling playful interactions. The study highlights benefits in emotional expression, information highlighting, and speaker identification, but also notes ambiguity risks in non-textual cues and keyword proliferation, which motivate design implications. Overall, the work contributes a concrete design space, a functional system, and evidence-based guidance for deploying expressive, multimodal communication tools in social VR, with applications in accessibility, education, and live streaming.
Abstract
Social Virtual Reality (VR) emerges as a promising platform bringing immersive, interactive, and engaging mechanisms for collaborative activities in virtual spaces. However, interpersonal communication in social VR is still limited with existing mediums and channels. To bridge the gap, we propose a novel method for mediating real-time conversation in social VR, which uses impact captions, a type of typographic visual effect widely used in videos, to convey both verbal and non-verbal information. We first investigated the design space of impact captions by content analysis and a co-design session with four experts. Next, we implemented SpeechCap as a proof-of-concept system, with which users can communicate with each other using speech-driven impact captions in VR. Through a user study (n=14), we evaluated the effectiveness of the visual and interaction design of impact captions, highlighting the interactivity and the integration of verbal and non-verbal information in communication mediums. Finally, we discussed topics of visual rhetoric, interactivity, and ambiguity as the main findings from the study, and further provided design implications for future work for facilitating interpersonal communication in social VR.
