Table of Contents
Fetching ...

SARA: Smart AI Reading Assistant for Reading Comprehension

Enkeleda Thaqi, Mohamed Mantawy, Enkelejda Kasneci

TL;DR

SARA tackles the challenge of real-time reading comprehension by merging eye-tracking in a mixed-reality headset with GPT-4-powered reading support. The system detects reading difficulties through gaze dwell-time and regressions, localizes on-screen text via QR cues and OCR, maps gaze to text regions, and provides contextual definitions, translations, and paraphrasing as augmented overlays. Its end-to-end pipeline—text localization, gaze alignment, difficulty detection, and AR-delivered assistance—demonstrates the feasibility of gaze-driven, personalized reading augmentation in MR. This work highlights the potential of combining MR technology with advanced LLMs to enhance reading efficiency and accessibility, with future work aimed at broader evaluation and impact assessment.

Abstract

SARA integrates Eye Tracking and state-of-the-art large language models in a mixed reality framework to enhance the reading experience by providing personalized assistance in real-time. By tracking eye movements, SARA identifies the text segments that attract the user's attention the most and potentially indicate uncertain areas and comprehension issues. The process involves these key steps: text detection and extraction, gaze tracking and alignment, and assessment of detected reading difficulty. The results are customized solutions presented directly within the user's field of view as virtual overlays on identified difficult text areas. This support enables users to overcome challenges like unfamiliar vocabulary and complex sentences by offering additional context, rephrased solutions, and multilingual help. SARA's innovative approach demonstrates it has the potential to transform the reading experience and improve reading proficiency.

SARA: Smart AI Reading Assistant for Reading Comprehension

TL;DR

SARA tackles the challenge of real-time reading comprehension by merging eye-tracking in a mixed-reality headset with GPT-4-powered reading support. The system detects reading difficulties through gaze dwell-time and regressions, localizes on-screen text via QR cues and OCR, maps gaze to text regions, and provides contextual definitions, translations, and paraphrasing as augmented overlays. Its end-to-end pipeline—text localization, gaze alignment, difficulty detection, and AR-delivered assistance—demonstrates the feasibility of gaze-driven, personalized reading augmentation in MR. This work highlights the potential of combining MR technology with advanced LLMs to enhance reading efficiency and accessibility, with future work aimed at broader evaluation and impact assessment.

Abstract

SARA integrates Eye Tracking and state-of-the-art large language models in a mixed reality framework to enhance the reading experience by providing personalized assistance in real-time. By tracking eye movements, SARA identifies the text segments that attract the user's attention the most and potentially indicate uncertain areas and comprehension issues. The process involves these key steps: text detection and extraction, gaze tracking and alignment, and assessment of detected reading difficulty. The results are customized solutions presented directly within the user's field of view as virtual overlays on identified difficult text areas. This support enables users to overcome challenges like unfamiliar vocabulary and complex sentences by offering additional context, rephrased solutions, and multilingual help. SARA's innovative approach demonstrates it has the potential to transform the reading experience and improve reading proficiency.
Paper Structure (7 sections)