Audo-Sight: AI-driven Ambient Perception Across Edge-Cloud for Blind and Low Vision Users

Jacob Bradshaw; Mohsen Riahi Alam; Bhanuja Ainary; Minseo Kim; Mohsen Amini Salehi

Audo-Sight: AI-driven Ambient Perception Across Edge-Cloud for Blind and Low Vision Users

Jacob Bradshaw, Mohsen Riahi Alam, Bhanuja Ainary, Minseo Kim, Mohsen Amini Salehi

Abstract

Despite advances in assistive technologies, Blind and Low-Vision (BLV) individuals continue to face challenges in understanding their surroundings. Delivering concise, useful, and timely scene descriptions for ambient perception remains a long-standing accessibility problem. To address this, we introduce Audo-Sight, an AI-driven assistive system across Edge-Cloud that enables BLV individuals to perceive their surroundings through voice-based conversational interaction. Audo-Sight employs a set of expert and generic AI agents, each supported by dedicated processing pipelines distributed across edge and cloud. It analyzes user queries by considering urgency and contextual information to infer the user intent and dynamically route each query, along with a scene frame, to the most suitable pipeline. In cases where users require fast responses, the system simultaneously leverages edge and cloud processing pipelines. The edge generates an initial response quickly, while the cloud provides more detailed and accurate information. To overcome the challenge of seamlessly combining these outputs, we introduce the Response Fusion Engine, which fuses the fast edge response with the more accurate cloud output, ensuring timely and high-accuracy response for the BLV users. Systematic evaluation shows that Audo-Sight delivers speech output around 80% faster for urgent tasks and generates complete responses approximately 50% faster across all tasks compared to a commercial cloud-based solution -- highlighting the effectiveness of our system across edge-cloud. Human evaluation of Audo-Sight shows that it is the preferred choice over GPT-5 for 62% of BLV participants with another 23% stating both perform comparably.

Audo-Sight: AI-driven Ambient Perception Across Edge-Cloud for Blind and Low Vision Users

Abstract

Paper Structure (21 sections, 7 figures, 1 table, 1 algorithm)

This paper contains 21 sections, 7 figures, 1 table, 1 algorithm.

Introduction
Related Work
Edge-Cloud Systems for BLV Accessibility
Smart BLV Accessibility Products
Audo-Sight: A Multi-Modal Smart Assistive Technology across Edge-Cloud
Audo-Sight Architecture
Input Management Module
Cognition Module
Urgency Detector
AI Router
Reasoning Engine
Response Management Module
Blind-Friendly Response Editor
Response Fusion Engine
Experimental Study/ Evaluation
...and 6 more sections

Figures (7)

Figure 1: Overview of the Audo-Sight framework that can provide conversational ambient perception for BLV individuals.
Figure 2: A bird-eye view of the Audo-Sight architecture and its primary components
Figure 3: Internal mechanics of the Cognition and Response Management Modules of the Audo-Sight platform
Figure 4: Schematic view of Response Fusion Engine. The edge MLLM processing is interrupted (red X symbol in the figure) once the higher quality cloud MLLM response is ready.
Figure 5: (a) First Token Latency (TTFT) comparison for urgent and normal tasks across three systems. (b) Comparison of end-to-end latency for urgent and normal tasks across three systems.
...and 2 more figures

Audo-Sight: AI-driven Ambient Perception Across Edge-Cloud for Blind and Low Vision Users

Abstract

Audo-Sight: AI-driven Ambient Perception Across Edge-Cloud for Blind and Low Vision Users

Authors

Abstract

Table of Contents

Figures (7)