ScanGAN360: A Generative Model of Realistic Scanpaths for 360$^{\circ}$ Images

Daniel Martin; Ana Serrano; Alexander W. Bergman; Gordon Wetzstein; Belen Masia

ScanGAN360: A Generative Model of Realistic Scanpaths for 360$^{\circ}$ Images

Daniel Martin, Ana Serrano, Alexander W. Bergman, Gordon Wetzstein, Belen Masia

TL;DR

ScanGAN360 presents a conditional GAN for generating realistic 360° gaze scanpaths by combining a sphere-aware 3D scanpath parameterization with a spherical DTW-based loss. A two-branch architecture uses panoramic convolutions and CoordConv to accommodate 360° distortions, enabling generation of long, diverse scanpaths (up to 30 seconds) at high speed (~$10^3$ scanpaths/s) without ground-truth one-to-one mappings. The approach outperforms prior 360° scanpath methods and closely approaches human baselines across multiple datasets, with thorough ablations validating the DTW_sph loss and architectural choices. Behavioral analyses show realistic exploration dynamics, equator bias, and inter-observer congruency, supporting practical deployment in VR content design, scanpath-driven thumbnails, and avatar gaze. The work also discusses limitations (fixed length, sampling rate) and outlines future directions for variable-length trajectories and low-level oculomotor dynamics, with code and models released for further research.

Abstract

Understanding and modeling the dynamics of human gaze behavior in 360$^\circ$ environments is a key challenge in computer vision and virtual reality. Generative adversarial approaches could alleviate this challenge by generating a large number of possible scanpaths for unseen images. Existing methods for scanpath generation, however, do not adequately predict realistic scanpaths for 360$^\circ$ images. We present ScanGAN360, a new generative adversarial approach to address this challenging problem. Our network generator is tailored to the specifics of 360$^\circ$ images representing immersive environments. Specifically, we accomplish this by leveraging the use of a spherical adaptation of dynamic-time warping as a loss function and proposing a novel parameterization of 360$^\circ$ scanpaths. The quality of our scanpaths outperforms competing approaches by a large margin and is almost on par with the human baseline. ScanGAN360 thus allows fast simulation of large numbers of virtual observers, whose behavior mimics real users, enabling a better understanding of gaze behavior and novel applications in virtual scene design.

ScanGAN360: A Generative Model of Realistic Scanpaths for 360$^{\circ}$ Images

TL;DR

scanpaths/s) without ground-truth one-to-one mappings. The approach outperforms prior 360° scanpath methods and closely approaches human baselines across multiple datasets, with thorough ablations validating the DTW_sph loss and architectural choices. Behavioral analyses show realistic exploration dynamics, equator bias, and inter-observer congruency, supporting practical deployment in VR content design, scanpath-driven thumbnails, and avatar gaze. The work also discusses limitations (fixed length, sampling rate) and outlines future directions for variable-length trajectories and low-level oculomotor dynamics, with code and models released for further research.

Abstract

Understanding and modeling the dynamics of human gaze behavior in 360

environments is a key challenge in computer vision and virtual reality. Generative adversarial approaches could alleviate this challenge by generating a large number of possible scanpaths for unseen images. Existing methods for scanpath generation, however, do not adequately predict realistic scanpaths for 360

images. We present ScanGAN360, a new generative adversarial approach to address this challenging problem. Our network generator is tailored to the specifics of 360

images representing immersive environments. Specifically, we accomplish this by leveraging the use of a spherical adaptation of dynamic-time warping as a loss function and proposing a novel parameterization of 360

scanpaths. The quality of our scanpaths outperforms competing approaches by a large margin and is almost on par with the human baseline. ScanGAN360 thus allows fast simulation of large numbers of virtual observers, whose behavior mimics real users, enabling a better understanding of gaze behavior and novel applications in virtual scene design.

ScanGAN360: A Generative Model of Realistic Scanpaths for 360$^{\circ}$ Images

TL;DR

Abstract

ScanGAN360: A Generative Model of Realistic Scanpaths for 360$^{\circ}$ Images

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (56)