Flow-Aware Diffusion for Real-Time VR Restoration: Enhancing Spatiotemporal Coherence and Efficiency
Yitong Zhu, Qianghong Dong, Guanxuan Jiang, Zhuowen Liang, Yuyang Wang
TL;DR
This work tackles cybersickness in VR by addressing mismatches in visual motion, particularly optical flow, which disrupts vestibular-visual coherence. It introduces U-MAD, a flow-guided diffusion-based video restoration framework that operates at the image level with a plug-and-play U-shaped Mamba backbone and a Unified Motion-Structure Embedding (UMSE) to condition denoising on motion and structural priors. The system combines a Global Context Module, Post-Temporal Context Module, and flow-aware conditioning within a diffusion objective that jointly optimizes reconstruction, temporal coherence, and flow consistency, achieving real-time performance. Empirical results on VR and omnidirectional datasets show improved temporal stability and significant reductions in cybersickness in a user study, indicating practical impact for safer, more comfortable immersive experiences.
Abstract
Cybersickness remains a critical barrier to the widespread adoption of Virtual Reality (VR), particularly in scenarios involving intense or artificial motion cues. Among the key contributors is excessive optical flow-perceived visual motion that, when unmatched by vestibular input, leads to sensory conflict and discomfort. While previous efforts have explored geometric or hardware based mitigation strategies, such methods often rely on predefined scene structures, manual tuning, or intrusive equipment. In this work, we propose U-MAD, a lightweight, real-time, AI-based solution that suppresses perceptually disruptive optical flow directly at the image level. Unlike prior handcrafted approaches, this method learns to attenuate high-intensity motion patterns from rendered frames without requiring mesh-level editing or scene specific adaptation. Designed as a plug and play module, U-MAD integrates seamlessly into existing VR pipelines and generalizes well to procedurally generated environments. The experiments show that U-MAD consistently reduces average optical flow and enhances temporal stability across diverse scenes. A user study further confirms that reducing visual motion leads to improved perceptual comfort and alleviated cybersickness symptoms. These findings demonstrate that perceptually guided modulation of optical flow provides an effective and scalable approach to creating more user-friendly immersive experiences. The code will be released at https://github.com/XXXXX (upon publication).
