Table of Contents
Fetching ...

How Deep Is Your Gaze? Leveraging Distance in Image-Based Gaze Analysis

Maurice Koch, Nelusa Pathmanathan, Daniel Weiskopf, Kuno Kurzhals

TL;DR

This work tackles the issue of depth-induced variability in image-based gaze analysis by introducing depth-adaptive thumbnails that scale according to eye-to-object distance, preserving a consistent visual focus. The authors implement two patch strategies (classic fixed-angle and depth-adaptive constant-length) and evaluate them in AR using scanpath similarity (Smith-Waterman with 512-D ResNet features) and visualization methods (Gaze Stripes and image-based projections). Results show depth-adaptive thumbnails improve analysis quality, especially for mid and large patches, and enhance visualization coherence, demonstrated on a benchmark AR dataset of real and virtual objects across multiple distances. While promising, the study notes limitations in depth estimation robustness and feature representation, and calls for broader evaluation with diverse hardware and live-depth sensing scenarios.

Abstract

Image thumbnails are a valuable data source for fixation filtering, scanpath classification, and visualization of eye tracking data. They are typically extracted with a constant size, approximating the foveated area. As a consequence, the focused area of interest in the scene becomes less prominent in the thumbnail with increasing distance, affecting image-based analysis techniques. In this work, we propose depth-adaptive thumbnails, a method for varying image size according to the eye-to-object distance. Adjusting the visual angle relative to the distance leads to a zoom effect on the focused area. We evaluate our approach on recordings in augmented reality, investigating the similarity of thumbnails and scanpaths. Our quantitative findings suggest that considering the eye-to-object distance improves the quality of data analysis and visualization. We demonstrate the utility of depth-adaptive thumbnails for applications in scanpath comparison and visualization.

How Deep Is Your Gaze? Leveraging Distance in Image-Based Gaze Analysis

TL;DR

This work tackles the issue of depth-induced variability in image-based gaze analysis by introducing depth-adaptive thumbnails that scale according to eye-to-object distance, preserving a consistent visual focus. The authors implement two patch strategies (classic fixed-angle and depth-adaptive constant-length) and evaluate them in AR using scanpath similarity (Smith-Waterman with 512-D ResNet features) and visualization methods (Gaze Stripes and image-based projections). Results show depth-adaptive thumbnails improve analysis quality, especially for mid and large patches, and enhance visualization coherence, demonstrated on a benchmark AR dataset of real and virtual objects across multiple distances. While promising, the study notes limitations in depth estimation robustness and feature representation, and calls for broader evaluation with diverse hardware and live-depth sensing scenarios.

Abstract

Image thumbnails are a valuable data source for fixation filtering, scanpath classification, and visualization of eye tracking data. They are typically extracted with a constant size, approximating the foveated area. As a consequence, the focused area of interest in the scene becomes less prominent in the thumbnail with increasing distance, affecting image-based analysis techniques. In this work, we propose depth-adaptive thumbnails, a method for varying image size according to the eye-to-object distance. Adjusting the visual angle relative to the distance leads to a zoom effect on the focused area. We evaluate our approach on recordings in augmented reality, investigating the similarity of thumbnails and scanpaths. Our quantitative findings suggest that considering the eye-to-object distance improves the quality of data analysis and visualization. We demonstrate the utility of depth-adaptive thumbnails for applications in scanpath comparison and visualization.
Paper Structure (34 sections, 3 equations, 7 figures)

This paper contains 34 sections, 3 equations, 7 figures.

Figures (7)

  • Figure 1: A fixed field of vision, given by angle $\theta$, causes scene objects to become cropped when they are too close to the eye or to vanish at great distances. Setting the actual length $c$ to a constant value keeps the scene objects in the focus of the beholder by varying the eye's field of vision.
  • Figure 2: Benchmark scene comprised of three real-world objects (R1 --- R3) and three virtual objects (V1 --- V3) on a desk. Left: Experiment space with three viewing locations positioned at distances of 50 cm, 150 cm, and 300 cm. Right: Scene view from the HoloLens 2.
  • Figure 3: Smith-Waterman scores (higher are better) between scanpaths from different depth levels of 0.5 m, 1.5 m, and 3 m. Similarity scores are computed between gaze patch sequences from two depth levels. Small (s), mid (m), and large (l) categories refer to different patch sizes.
  • Figure 4: Gaze Stripes generated from a recording where a participant moves toward a guitar. In classic thumbnails (left), the guitar changes its apparent size as the beholder moves toward it. In depth-adaptive thumbnails, the guitar's scale remains stable across different distances (right).
  • Figure 5: Gridified UMAP projection of classic thumbnails (left) and depth-adaptive thumbnails (right). In classic thumbnails, some fixations on the Aleo Vera are mapped outside the main cluster. In depth-adaptive thumbnails, all fixations on Aleo-Vera are clustered into a single region.
  • ...and 2 more figures