Gaze Archive: Enhancing Human Memory through Active Visual Logging on Smart Glasses
Haoxin Ren, Feng Lu
TL;DR
Gaze Archive presents a gaze-driven paradigm for visual memory augmentation that aims to achieve intent-precise capture with effortless interaction. The GaHMA framework partitions scenes around gaze, encodes memories hierarchically with focal-detail and contextual summaries, and enables natural language retrieval, validated by GaVER-based quantitative evaluation and real-world usability studies. Results show superior recall efficiency and storage savings over non-gaze baselines, with users favoring the approach for lower effort and intrusion in both lab and real-world settings. The work demonstrates practical potential for wearable memory aids while outlining future directions for on-device LVLMs, privacy safeguards, and multimodal memory expansion. Overall, Gaze Archive advances memory augmentation by tightly coupling natural gaze cues with intelligent encoding and retrieval in AR wearables.
Abstract
People today are overwhelmed by massive amounts of information, leading to cognitive overload and memory burden. Traditional visual memory augmentation methods are either effortful and disruptive or fail to align with user intent. To address these limitations, we propose Gaze Archive, a novel visual memory enhancement paradigm through active logging on smart glasses. It leverages human gaze as a natural attention indicator, enabling both intent-precise capture and effortless-and-unobtrusive interaction. To implement Gaze Archive, we develop GAHMA, a technical framework that enables compact yet intent-aligned memory encoding and intuitive memory recall based on natural language queries. Quantitative experiments on our newly constructed GAVER dataset show that GAHMA achieves more intent-precise logging than non-gaze baselines. Through extensive user studies in both laboratory and real-world scenarios, we compare Gaze Archive with other existing memory augmentation methods. Results demonstrate its advantages in perceived effortlessness, unobtrusiveness and overall preference, showing strong potential for real-world deployment.
