Table of Contents
Fetching ...

Narrative Review of Emotional Expression Support in XR: Psychophysiology of Speech-to-Text Interfaces

Sunday David Ubur, Denis Gracanin

TL;DR

This paper investigates how to embed emotional expression into STT interfaces in XR, focusing on psychophysiology, accessibility, and affective design. It synthesizes 37 studies from 2020–2024 across DHH technologies, captioning innovations, emotion recognition, and empathic systems, and finds a persistent gap in real-time emotion-aware captions embedded in immersive contexts. It highlights promising approaches such as animated captions, emojilization, color-coded overlays, and avatar-based emotion visualization, while noting scalability and integration challenges. The work argues for interdisciplinary collaboration to develop affect-responsive captioning interfaces that reduce cognitive load and improve engagement in education and training. Overall, it emphasizes moving beyond neutral captions toward emotionally faithful, user-centered XR communication tools.

Abstract

This narrative review examines recent advancements, limitations, and research gaps in integrating emotional expression into speech-to-text (STT) interfaces within extended reality (XR) environments. Drawing from 37 peer-reviewed studies published between 2020 and 2024, we synthesized literature across multiple domains, including affective computing, psychophysiology, captioning innovation, and immersive human-computer interaction. Thematic categories include communication enhancement technologies for Deaf and Hard of Hearing (DHH) users, emotive captioning strategies, visual and affective augmentation in AR/VR, speech emotion recognition, and the development of empathic systems. Despite the growing accessibility of real-time STT tools, such systems largely fail to convey affective nuance, limiting the richness of communication for DHH users and other caption consumers. This review highlights emerging approaches such as animated captions, emojilization, color-coded overlays, and avatar-based emotion visualization, but finds a persistent gap in real-time emotion-aware captioning within immersive XR contexts. We identify key research opportunities at the intersection of accessibility, XR, and emotional expression, and propose future directions for the development of affect-responsive, user-centered captioning interfaces.

Narrative Review of Emotional Expression Support in XR: Psychophysiology of Speech-to-Text Interfaces

TL;DR

This paper investigates how to embed emotional expression into STT interfaces in XR, focusing on psychophysiology, accessibility, and affective design. It synthesizes 37 studies from 2020–2024 across DHH technologies, captioning innovations, emotion recognition, and empathic systems, and finds a persistent gap in real-time emotion-aware captions embedded in immersive contexts. It highlights promising approaches such as animated captions, emojilization, color-coded overlays, and avatar-based emotion visualization, while noting scalability and integration challenges. The work argues for interdisciplinary collaboration to develop affect-responsive captioning interfaces that reduce cognitive load and improve engagement in education and training. Overall, it emphasizes moving beyond neutral captions toward emotionally faithful, user-centered XR communication tools.

Abstract

This narrative review examines recent advancements, limitations, and research gaps in integrating emotional expression into speech-to-text (STT) interfaces within extended reality (XR) environments. Drawing from 37 peer-reviewed studies published between 2020 and 2024, we synthesized literature across multiple domains, including affective computing, psychophysiology, captioning innovation, and immersive human-computer interaction. Thematic categories include communication enhancement technologies for Deaf and Hard of Hearing (DHH) users, emotive captioning strategies, visual and affective augmentation in AR/VR, speech emotion recognition, and the development of empathic systems. Despite the growing accessibility of real-time STT tools, such systems largely fail to convey affective nuance, limiting the richness of communication for DHH users and other caption consumers. This review highlights emerging approaches such as animated captions, emojilization, color-coded overlays, and avatar-based emotion visualization, but finds a persistent gap in real-time emotion-aware captioning within immersive XR contexts. We identify key research opportunities at the intersection of accessibility, XR, and emotional expression, and propose future directions for the development of affect-responsive, user-centered captioning interfaces.
Paper Structure (23 sections, 2 figures)

This paper contains 23 sections, 2 figures.

Figures (2)

  • Figure 1: Thematic map summarizing the key areas reviewed in this study, including communication enhancement technologies, innovations in captioning, emotion recognition in XR, empathic machine interfaces, and emerging strategies in visual and affective augmentation.
  • Figure 2: Workflow of an emotion-aware STT system in XR environments. Speech is converted to text, enhanced with emotional cues (e.g., emojis, color, animation), and displayed in AR/VR settings.