AIDEN: Design and Pilot Study of an AI Assistant for the Visually Impaired
Luis Marquez-Carpintero, Francisco Gomez-Donoso, Zuria Bauer, Bessie Dominguez-Dager, Alvaro Belmonte-Baeza, Mónica Pina-Navarro, Francisco Morillas-Espejo, Felix Escalona, Miguel Cazorla
TL;DR
AIDEN addresses autonomy for visually impaired users by integrating real-time object detection (YOLO) and OCR/VQA (LLaVA) with a novel continuous haptic guidance system to reduce auditory overload and protect privacy. The system runs a hybrid architecture with server-based AI and a smartphone interface, enabling Text-To-Speech, scene description, and object finding without storing personal data. A pilot study with seven participants shows high perceived usefulness and ease of use, with Find an Object delivering near real-time feedback and strong adoption potential. The findings suggest multimodal haptic-visual feedback can enhance daily usability and independence compared with traditional audio-centric approaches, justifying larger-scale clinical validations.
Abstract
This paper presents AIDEN, an artificial intelligence-based assistant designed to enhance the autonomy and daily quality of life of visually impaired individuals, who often struggle with object identification, text reading, and navigation in unfamiliar environments. Existing solutions such as screen readers or audio-based assistants facilitate access to information but frequently lead to auditory overload and raise privacy concerns in open environments. AIDEN addresses these limitations with a hybrid architecture that integrates You Only Look Once (YOLO) for real-time object detection and a Large Language and Vision Assistant (LLaVA) for scene description and Optical Character Recognition (OCR). A key novelty of the system is a continuous haptic guidance mechanism based on a Geiger-counter metaphor, which supports object centering without occupying the auditory channel, while privacy is preserved by ensuring that no personal data are stored. Empirical evaluations with visually impaired participants assessed perceived ease of use and acceptance using the Technology Acceptance Model (TAM). Results indicate high user satisfaction, particularly regarding intuitiveness and perceived autonomy. Moreover, the ``Find an Object'' achieved effective real-time performance. These findings provide promising evidence that multimodal haptic-visual feedback can improve daily usability and independence compared to traditional audio-centric methods, motivating larger-scale clinical validations.
