ScanGAN360: A Generative Model of Realistic Scanpaths for 360$^{\circ}$ Images
Daniel Martin, Ana Serrano, Alexander W. Bergman, Gordon Wetzstein, Belen Masia
TL;DR
ScanGAN360 presents a conditional GAN for generating realistic 360° gaze scanpaths by combining a sphere-aware 3D scanpath parameterization with a spherical DTW-based loss. A two-branch architecture uses panoramic convolutions and CoordConv to accommodate 360° distortions, enabling generation of long, diverse scanpaths (up to 30 seconds) at high speed (~$10^3$ scanpaths/s) without ground-truth one-to-one mappings. The approach outperforms prior 360° scanpath methods and closely approaches human baselines across multiple datasets, with thorough ablations validating the DTW_sph loss and architectural choices. Behavioral analyses show realistic exploration dynamics, equator bias, and inter-observer congruency, supporting practical deployment in VR content design, scanpath-driven thumbnails, and avatar gaze. The work also discusses limitations (fixed length, sampling rate) and outlines future directions for variable-length trajectories and low-level oculomotor dynamics, with code and models released for further research.
Abstract
Understanding and modeling the dynamics of human gaze behavior in 360$^\circ$ environments is a key challenge in computer vision and virtual reality. Generative adversarial approaches could alleviate this challenge by generating a large number of possible scanpaths for unseen images. Existing methods for scanpath generation, however, do not adequately predict realistic scanpaths for 360$^\circ$ images. We present ScanGAN360, a new generative adversarial approach to address this challenging problem. Our network generator is tailored to the specifics of 360$^\circ$ images representing immersive environments. Specifically, we accomplish this by leveraging the use of a spherical adaptation of dynamic-time warping as a loss function and proposing a novel parameterization of 360$^\circ$ scanpaths. The quality of our scanpaths outperforms competing approaches by a large margin and is almost on par with the human baseline. ScanGAN360 thus allows fast simulation of large numbers of virtual observers, whose behavior mimics real users, enabling a better understanding of gaze behavior and novel applications in virtual scene design.
