Table of Contents
Fetching ...

Gaze Archive: Enhancing Human Memory through Active Visual Logging on Smart Glasses

Haoxin Ren, Feng Lu

TL;DR

Gaze Archive presents a gaze-driven paradigm for visual memory augmentation that aims to achieve intent-precise capture with effortless interaction. The GaHMA framework partitions scenes around gaze, encodes memories hierarchically with focal-detail and contextual summaries, and enables natural language retrieval, validated by GaVER-based quantitative evaluation and real-world usability studies. Results show superior recall efficiency and storage savings over non-gaze baselines, with users favoring the approach for lower effort and intrusion in both lab and real-world settings. The work demonstrates practical potential for wearable memory aids while outlining future directions for on-device LVLMs, privacy safeguards, and multimodal memory expansion. Overall, Gaze Archive advances memory augmentation by tightly coupling natural gaze cues with intelligent encoding and retrieval in AR wearables.

Abstract

People today are overwhelmed by massive amounts of information, leading to cognitive overload and memory burden. Traditional visual memory augmentation methods are either effortful and disruptive or fail to align with user intent. To address these limitations, we propose Gaze Archive, a novel visual memory enhancement paradigm through active logging on smart glasses. It leverages human gaze as a natural attention indicator, enabling both intent-precise capture and effortless-and-unobtrusive interaction. To implement Gaze Archive, we develop GAHMA, a technical framework that enables compact yet intent-aligned memory encoding and intuitive memory recall based on natural language queries. Quantitative experiments on our newly constructed GAVER dataset show that GAHMA achieves more intent-precise logging than non-gaze baselines. Through extensive user studies in both laboratory and real-world scenarios, we compare Gaze Archive with other existing memory augmentation methods. Results demonstrate its advantages in perceived effortlessness, unobtrusiveness and overall preference, showing strong potential for real-world deployment.

Gaze Archive: Enhancing Human Memory through Active Visual Logging on Smart Glasses

TL;DR

Gaze Archive presents a gaze-driven paradigm for visual memory augmentation that aims to achieve intent-precise capture with effortless interaction. The GaHMA framework partitions scenes around gaze, encodes memories hierarchically with focal-detail and contextual summaries, and enables natural language retrieval, validated by GaVER-based quantitative evaluation and real-world usability studies. Results show superior recall efficiency and storage savings over non-gaze baselines, with users favoring the approach for lower effort and intrusion in both lab and real-world settings. The work demonstrates practical potential for wearable memory aids while outlining future directions for on-device LVLMs, privacy safeguards, and multimodal memory expansion. Overall, Gaze Archive advances memory augmentation by tightly coupling natural gaze cues with intelligent encoding and retrieval in AR wearables.

Abstract

People today are overwhelmed by massive amounts of information, leading to cognitive overload and memory burden. Traditional visual memory augmentation methods are either effortful and disruptive or fail to align with user intent. To address these limitations, we propose Gaze Archive, a novel visual memory enhancement paradigm through active logging on smart glasses. It leverages human gaze as a natural attention indicator, enabling both intent-precise capture and effortless-and-unobtrusive interaction. To implement Gaze Archive, we develop GAHMA, a technical framework that enables compact yet intent-aligned memory encoding and intuitive memory recall based on natural language queries. Quantitative experiments on our newly constructed GAVER dataset show that GAHMA achieves more intent-precise logging than non-gaze baselines. Through extensive user studies in both laboratory and real-world scenarios, we compare Gaze Archive with other existing memory augmentation methods. Results demonstrate its advantages in perceived effortlessness, unobtrusiveness and overall preference, showing strong potential for real-world deployment.

Paper Structure

This paper contains 52 sections, 2 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: Pipeline of GaHMA.Capture: (a) smart glasses capture of scene-gaze pairs with other auxiliary information; Partitioning: (b) focal region localization based on gaze fixation and foveal vision model; (c) contextual region expansion based on semantic analysis using LVLMs; Encoding: (d) region-specific encoding using LVLMs; (e) hybrid information storage for better recall quality; Retrieval: (f) question-driven retrieval from memory achives and answer with LVLMs.
  • Figure 2: Illustration of focal region partitioning. (a) Focal region is localized based on gaze fixation and foveal vision model; (b) focal region is expanded into contextual region.
  • Figure 3: Example image and annotation (gaze point and Q&A pair) in GaVER dateset.
  • Figure 4: Results of region-specific encoding experiment. (Left) Comparison of recall accuracy and storage efficiency on GaVER-core and GaVER-3k. (Right) Significance analysis results between different encoding strategies on recall accuracy. Error bars represent standard error, with significant differences marked by ** ($p<0.01$) and *** ($p<0.001$).
  • Figure 5: Recall accuracy and storage efficiency of various methods (a) with/without incorporating background description (b) under hybrid storage setting (ctx_desc denotes the textual description generated with the corresponding encoding strategy).
  • ...and 6 more figures