Table of Contents
Fetching ...

GazeTrak: Exploring Acoustic-based Eye Tracking on a Glass Frame

Ke Li, Ruidong Zhang, Boao Chen, Siyuan Chen, Sicheng Yin, Saif Mahmud, Qikang Liang, François Guimbretière, Cheng Zhang

TL;DR

GazeTrak presents the first acoustic-based gaze tracking system integrated into glasses, using two speakers and eight microphones to emit inaudible FMCW signals and extract echo profiles for gaze inference via a ResNet-18–based model. The approach achieves a cross-session MGAE of around $4.9^ ext{$\circ$}$ and an in-session MGAE of $3.6^ ext{$\circ$}$ at 83.3 Hz, with a total power footprint of ~287.9 mW on a portable platform, and ~95.4 mW at 30 Hz when deployed on MAX78002 with optimized processing. A ground-truth calibration method using on-screen instruction points enables training without external trackers, and an MCU-based real-time pipeline demonstrates feasible deployment on a low-power device. The work highlights strong potential for long-duration, privacy-preserving gaze tracking in wearable contexts, while also outlining calibration, integration, and environmental robustness challenges for future work.

Abstract

In this paper, we present GazeTrak, the first acoustic-based eye tracking system on glasses. Our system only needs one speaker and four microphones attached to each side of the glasses. These acoustic sensors capture the formations of the eyeballs and the surrounding areas by emitting encoded inaudible sound towards eyeballs and receiving the reflected signals. These reflected signals are further processed to calculate the echo profiles, which are fed to a customized deep learning pipeline to continuously infer the gaze position. In a user study with 20 participants, GazeTrak achieves an accuracy of 3.6° within the same remounting session and 4.9° across different sessions with a refreshing rate of 83.3 Hz and a power signature of 287.9 mW. Furthermore, we report the performance of our gaze tracking system fully implemented on an MCU with a low-power CNN accelerator (MAX78002). In this configuration, the system runs at up to 83.3 Hz and has a total power signature of 95.4 mW with a 30 Hz FPS.

GazeTrak: Exploring Acoustic-based Eye Tracking on a Glass Frame

TL;DR

GazeTrak presents the first acoustic-based gaze tracking system integrated into glasses, using two speakers and eight microphones to emit inaudible FMCW signals and extract echo profiles for gaze inference via a ResNet-18–based model. The approach achieves a cross-session MGAE of around \circ and an in-session MGAE of \circ at 83.3 Hz, with a total power footprint of ~287.9 mW on a portable platform, and ~95.4 mW at 30 Hz when deployed on MAX78002 with optimized processing. A ground-truth calibration method using on-screen instruction points enables training without external trackers, and an MCU-based real-time pipeline demonstrates feasible deployment on a low-power device. The work highlights strong potential for long-duration, privacy-preserving gaze tracking in wearable contexts, while also outlining calibration, integration, and environmental robustness challenges for future work.

Abstract

In this paper, we present GazeTrak, the first acoustic-based eye tracking system on glasses. Our system only needs one speaker and four microphones attached to each side of the glasses. These acoustic sensors capture the formations of the eyeballs and the surrounding areas by emitting encoded inaudible sound towards eyeballs and receiving the reflected signals. These reflected signals are further processed to calculate the echo profiles, which are fed to a customized deep learning pipeline to continuously infer the gaze position. In a user study with 20 participants, GazeTrak achieves an accuracy of 3.6° within the same remounting session and 4.9° across different sessions with a refreshing rate of 83.3 Hz and a power signature of 287.9 mW. Furthermore, we report the performance of our gaze tracking system fully implemented on an MCU with a low-power CNN accelerator (MAX78002). In this configuration, the system runs at up to 83.3 Hz and has a total power signature of 95.4 mW with a 30 Hz FPS.
Paper Structure (49 sections, 1 equation, 5 figures, 7 tables)

This paper contains 49 sections, 1 equation, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Echo Profiles of Different Microphones when Moving Gaze to Different Regions of The Screen.
  • Figure 2: Overview of the System: Use the Speaker on the Right Side (18-21 kHz) for Illustration.
  • Figure 3: Hardware and Form Factor for GazeTrak: (a) Speaker board; (b) Microphone board (front view); (c) Microphone board (back view); (d) Customized PCB board for the audio chip NXP SGTL5000; (e) Teensy 4.1; (f) Glasses form factor with speakers and microphones attached (M1-8: microphones, S1-2: speakers); (g) Attachable and more compact prototype; (h) MAX78002 Evaluation Kit.
  • Figure 4: MGAE Distribution across Participants.
  • Figure 5: Deployed on Glasses with Various Frame Styles.