SweeperBot: Making 3D Browsing Accessible through View Analysis and Visual Question Answering
Chen Chen, Cuong Nguyen, Alexa Siu, Dingzeyu Li, Nadir Weibel
TL;DR
BLV users struggle to access and compare 3D content, especially in online shopping and design contexts. SweeperBot couples a novel three-stage VQA pipeline with an SR-accessible editable table to answer visual questions from multiple sampled views, guided by CLIP for relevance and Grounding DINO for object recognition, and uses LLM/MLLM-based reasoning for final answers. The work introduces an SR-friendly interface and validates it through an expert BLV study (n=10) and a sighted evaluation (n=30) of generated descriptions, demonstrating improved accessibility and decision support for 3D browsing. Findings suggest practical applicability to e-commerce, education, and GenAI-driven 3D workflows, with potential extensions to larger scenes and real-world deployments.
Abstract
Accessing 3D models remains challenging for Screen Reader (SR) users. While some existing 3D viewers allow creators to provide alternative text, they often lack sufficient detail about the 3D models. Grounded on a formative study, this paper introduces SweeperBot, a system that enables SR users to leverage visual question answering to explore and compare 3D models. SweeperBot answers SR users' visual questions by combining an optimal view selection technique with the strength of generative- and recognition-based foundation models. An expert review with 10 Blind and Low-Vision (BLV) users with SR experience demonstrated the feasibility of using SweeperBot to assist BLV users in exploring and comparing 3D models. The quality of the descriptions generated by SweeperBot was validated by a second survey study with 30 sighted participants.
