Table of Contents
Fetching ...

Heads Up eXperience (HUX): Always-On AI Companion for Human Computer Environment Interaction

Sukanth K, Sudhiksha Kandavel Rajan, Rajashekhar V S, Gowdham Prabhakar

TL;DR

This work introduces Heads Up eXperience (HUX) as an always-on AI companion for human–computer environment interaction in XR smart glasses, combining eye-gaze, real-time scene analysis, and speech with a multi-modal memory system to support task-specific decisions. The architecture integrates a vision–language model, a large language model, and modular task-specific detectors, enabling real-time EOIs/OOIs detection, task-focused scene enhancement, and memory retrieval via Retrieval-Augmented Generation. Key contributions include a modular HUX AI architecture, an event-driven video filtering pipeline, task-specific scene processing with modular model switching, and a memory framework that supports context-rich, multi-modal recall. The approach promises to transform personal and professional interactions with technology by delivering natural, context-aware assistance directly in XR glasses, paving the way for deeper human–AI collaboration in daily life.

Abstract

While current personal smart devices excel in digital domains, they fall short in assisting users during human environment interaction. This paper proposes Heads Up eXperience (HUX), an AI system designed to bridge this gap, serving as a constant companion across the extended reality (XR) environments. By tracking the user's eye gaze, analyzing the surrounding environment, and interpreting verbal contexts, the system captures and enhances multi-modal data, providing holistic context interpretation and memory storage in real-time task specific situations. This comprehensive approach enables more natural, empathetic and intelligent interactions between the user and HUX AI, paving the path for human computer environment interaction. Intended for deployment in smart glasses and extended reality headsets, HUX AI aims to become a personal and useful AI companion for daily life. By integrating digital assistance with enhanced physical world interactions, this technology has the potential to revolutionize human-AI collaboration in both personal and professional spheres paving the way for the future of personal smart devices.

Heads Up eXperience (HUX): Always-On AI Companion for Human Computer Environment Interaction

TL;DR

This work introduces Heads Up eXperience (HUX) as an always-on AI companion for human–computer environment interaction in XR smart glasses, combining eye-gaze, real-time scene analysis, and speech with a multi-modal memory system to support task-specific decisions. The architecture integrates a vision–language model, a large language model, and modular task-specific detectors, enabling real-time EOIs/OOIs detection, task-focused scene enhancement, and memory retrieval via Retrieval-Augmented Generation. Key contributions include a modular HUX AI architecture, an event-driven video filtering pipeline, task-specific scene processing with modular model switching, and a memory framework that supports context-rich, multi-modal recall. The approach promises to transform personal and professional interactions with technology by delivering natural, context-aware assistance directly in XR glasses, paving the way for deeper human–AI collaboration in daily life.

Abstract

While current personal smart devices excel in digital domains, they fall short in assisting users during human environment interaction. This paper proposes Heads Up eXperience (HUX), an AI system designed to bridge this gap, serving as a constant companion across the extended reality (XR) environments. By tracking the user's eye gaze, analyzing the surrounding environment, and interpreting verbal contexts, the system captures and enhances multi-modal data, providing holistic context interpretation and memory storage in real-time task specific situations. This comprehensive approach enables more natural, empathetic and intelligent interactions between the user and HUX AI, paving the path for human computer environment interaction. Intended for deployment in smart glasses and extended reality headsets, HUX AI aims to become a personal and useful AI companion for daily life. By integrating digital assistance with enhanced physical world interactions, this technology has the potential to revolutionize human-AI collaboration in both personal and professional spheres paving the way for the future of personal smart devices.
Paper Structure (50 sections, 15 figures, 4 tables)

This paper contains 50 sections, 15 figures, 4 tables.

Figures (15)

  • Figure 1: Concept of HUX AI. You can watch the video on YouTube by following this link (https://youtu.be/rM3la0N6vKM)
  • Figure 2: It shows the capability of the AI-enabled devices. A domain-specific model optimized for precise, specialized tasks (indicated in green). A generalist model offering broad, adaptable language understanding and generation (indicated in orange). HUX AI integrates a generalist base with modular, task-specific layers for flexible and specialized applications (written in red).
  • Figure 3: HUX AI Architecture: The origin of multi-modal data, data processing, multi-modal context processing, the generation of outputs. The "LIOU Stack" means Last-In-Only-Used Stack"
  • Figure 4: Expected: real-time Event-based Video Context Processing using VLMs for detected events of interest.
  • Figure 5: Reality: real-time event-based Video Context Processing using VLMs for detected events of interest. Some events are missed due to the computation time of the previous frame. The arrow "VLM Computation Time" specifies that the VLM is busy processing the previous frame and that subsequent events of interest are missed until freed.
  • ...and 10 more figures