Table of Contents
Fetching ...

Neural Radiance and Gaze Fields for Visual Attention Modeling in 3D Environments

Andrei Chubarau, Yinan Wang, James J. Clark

TL;DR

NeRGs address the challenge of visualizing gaze in 3D environments by augmenting a fixed NeRF with a lightweight gaze module to render both scene appearance and a 3D gaze density on surfaces. The system supports decoupled observer and rendering viewpoints and explicitly handles gaze occlusion via depth-based tests. Training uses head-pose data as a gaze proxy and aggregates rays into gaze probes to supervise a NeRF-based gaze predictor, achieving interactive 3D gaze visualization in real-world scenes. This geometry-aware, real-time approach enables flexible exploration of attention in complex 3D environments with potential applications in signage design, VR/AR interfaces, and shopper behavior analysis.

Abstract

We introduce Neural Radiance and Gaze Fields (NeRGs), a novel approach for representing visual attention in complex environments. Much like how Neural Radiance Fields (NeRFs) perform novel view synthesis, NeRGs reconstruct gaze patterns from arbitrary viewpoints, implicitly mapping visual attention to 3D surfaces. We achieve this by augmenting a standard NeRF with an additional network that models local egocentric gaze probability density, conditioned on scene geometry and observer position. The output of a NeRG is a rendered view of the scene alongside a pixel-wise salience map representing the conditional probability that a given observer fixates on visible surfaces. Unlike prior methods, our system is lightweight and enables visualization of gaze fields at interactive framerates. Moreover, NeRGs allow the observer perspective to be decoupled from the rendering camera and correctly account for gaze occlusion due to intervening geometry. We demonstrate the effectiveness of NeRGs using head pose from skeleton tracking as a proxy for gaze, employing our proposed gaze probes to aggregate noisy rays into robust probability density targets for supervision.

Neural Radiance and Gaze Fields for Visual Attention Modeling in 3D Environments

TL;DR

NeRGs address the challenge of visualizing gaze in 3D environments by augmenting a fixed NeRF with a lightweight gaze module to render both scene appearance and a 3D gaze density on surfaces. The system supports decoupled observer and rendering viewpoints and explicitly handles gaze occlusion via depth-based tests. Training uses head-pose data as a gaze proxy and aggregates rays into gaze probes to supervise a NeRF-based gaze predictor, achieving interactive 3D gaze visualization in real-world scenes. This geometry-aware, real-time approach enables flexible exploration of attention in complex 3D environments with potential applications in signage design, VR/AR interfaces, and shopper behavior analysis.

Abstract

We introduce Neural Radiance and Gaze Fields (NeRGs), a novel approach for representing visual attention in complex environments. Much like how Neural Radiance Fields (NeRFs) perform novel view synthesis, NeRGs reconstruct gaze patterns from arbitrary viewpoints, implicitly mapping visual attention to 3D surfaces. We achieve this by augmenting a standard NeRF with an additional network that models local egocentric gaze probability density, conditioned on scene geometry and observer position. The output of a NeRG is a rendered view of the scene alongside a pixel-wise salience map representing the conditional probability that a given observer fixates on visible surfaces. Unlike prior methods, our system is lightweight and enables visualization of gaze fields at interactive framerates. Moreover, NeRGs allow the observer perspective to be decoupled from the rendering camera and correctly account for gaze occlusion due to intervening geometry. We demonstrate the effectiveness of NeRGs using head pose from skeleton tracking as a proxy for gaze, employing our proposed gaze probes to aggregate noisy rays into robust probability density targets for supervision.

Paper Structure

This paper contains 17 sections, 4 equations, 8 figures.

Figures (8)

  • Figure 1: Visualization of 2D salience for a 3D scene. (Left) Scene rendered with a pre-trained NeRF. (Right) Gaze probability density predicted by DeepGaze IIe linardos2021deepgaze, a salience model that infers attention from 2D pixels without accounting for 3D geometry.
  • Figure 2: Overall system diagram of the Neural Radiance and Gaze Field (NeRG). We evaluate a NeRF with volume rendering and estimate gaze for the visible surfaces. Gaze is predicted from the observer's perspective, which can be decoupled from the rendering camera.
  • Figure 3: Gaze prediction in NeRG. Volume rendering of the NeRF produces a depth estimate along the traced ray. Gaze is evaluated between the position of the visible surface and the position of the observer. If the observer is decoupled from the rendering camera, gaze occlusion can be modeled by additionally evaluating depth from the observer's perspective to correct the position of the surface visible to the observer.
  • Figure 4: Egocentric gaze probability density for a randomly selected gaze probe (data from Wang2024eyeTracking), computed from the raw head pose rays (red dots) and visualized as a spherical heatmap.
  • Figure 5: Layout and dimensions (in meters) of the convenience store corresponding to the pose-tracking data from Wang2024eyeTracking. The blue point cloud shows the positions of individual gaze rays.
  • ...and 3 more figures