Real Eyes Realize Faster: Gaze Stability and Pupil Novelty for Efficient Egocentric Learning

Ajan Subramanian; Sumukh Bettadapura; Rohan Sathish

Real Eyes Realize Faster: Gaze Stability and Pupil Novelty for Efficient Egocentric Learning

Ajan Subramanian, Sumukh Bettadapura, Rohan Sathish

TL;DR

This work observes that modern eye-tracking headsets provide a continuous, training-free side channel that decomposes into two complementary axes: gaze fixation captures visual stability (quality), while pupil response captures arousal-linked moments (novelty), and operationalizes this insight as a Dual-Criterion Frame Curator that first gates frames by gaze quality and then ranks the survivors by pupil-derived novelty.

Abstract

Always-on egocentric cameras are increasingly used as demonstrations for embodied robotics, imitation learning, and assistive AR, but the resulting video streams are dominated by redundant and low-quality frames. Under the storage and battery constraints of wearable devices, choosing which frames to keep is as important as how to learn from them. We observe that modern eye-tracking headsets provide a continuous, training-free side channel that decomposes into two complementary axes: gaze fixation captures visual stability (quality), while pupil response captures arousal-linked moments (novelty). We operationalize this insight as a Dual-Criterion Frame Curator that first gates frames by gaze quality and then ranks the survivors by pupil-derived novelty. On the Visual Experience Dataset (VEDB), curated frames at 10% budget match the classification performance of the full stream, and naive signal fusion consistently destroys both contributions. The benefit is task-dependent: pupil ranking improves activity recognition, while gaze-only selection already dominates for scene recognition, confirming that the two signals serve genuinely different roles. Our method requires no model inference and operates at capture time, offering a path toward efficient, always-on egocentric data curation.

Real Eyes Realize Faster: Gaze Stability and Pupil Novelty for Efficient Egocentric Learning

TL;DR

Abstract

Paper Structure (62 sections, 4 equations, 5 figures, 16 tables)

This paper contains 62 sections, 4 equations, 5 figures, 16 tables.

Introduction
Related Work
From Physiological Signals to Frame Scores
Frame-level alignment.
Gaze Quality Score $g(t)$.
Pupil Novelty Score $p(t)$.
Temporal alignment of pupil responses.
Dual-Criterion Frame Curator
Single-signal baselines.
Naive fusion.
Sequential composition: the Dual-Criterion Curator.
Strategy summary.
Experimental Setup
Dataset.
Feature Extraction.
...and 47 more sections

Figures (5)

Figure 1: Quality--Novelty Decomposition.(a) Gaze confidence (x) captures stability; pupil response (y) captures novelty. Random includes junk; gaze-only yields clean but redundant frames; dual targets high stability and novelty. (b) Two-stage pipeline: gaze gate (top 75%) $\rightarrow$ pupil ranking within budget.
Figure 2: Correlation between physiological signals and DINOv2 feature change.(a) Pupil derivative $|dp/dt|$ is positively correlated with feature change at all lags (mean $\rho = +0.038$). (b) Gaze quality $g(t)$ is negatively correlated ($\rho = -0.037$), confirming it tracks stability. Error bars: $\pm 1$ s.d. across sessions.
Figure 3: Learning curves: activity (left), scene (right). Dual at 10% budget matches the performance achieved using all frames. Shaded: $\pm 1$ s.d. (10 seeds).
Figure 4: Task-Dependent Performance.(a) Activity: pupil ranking improves over gaze-only and random. (b) Scene: gaze-only dominates; pupil adds no benefit.
Figure 5: Qualitative Comparison. Top: dual-criterion (high gaze, high pupil). Bottom: random baseline, including blur and low-information content.

Real Eyes Realize Faster: Gaze Stability and Pupil Novelty for Efficient Egocentric Learning

TL;DR

Abstract

Real Eyes Realize Faster: Gaze Stability and Pupil Novelty for Efficient Egocentric Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)