BIV-Priv-Seg: Locating Private Content in Images Taken by People With Visual Impairments
Yu-Yun Tseng, Tanusree Sharma, Lotus Zhang, Abigale Stangl, Leah Findlater, Yang Wang, Danna Gurari
TL;DR
This work introduces BIV-Priv-Seg, the first localization dataset derived from people with visual impairments that provides segmentation for 16 private object categories across 1,028 images, with 967 instance segmentations. It assesses zero-shot and few-shot localization methods (DeFRCN, YOLACT) and vision-language grounding models (GLaMM, GroundingDINO+SAM), finding that current approaches still struggle with non-salient, small, or text-containing private objects and with detecting absence of private content. Global and localized analyses reveal distinctive dataset traits, such as high text prevalence and objects frequently near image borders, which influence model performance. The dataset and evaluation framework pave the way for privacy-aware grounding and localization tools for BLV users and broader contexts, with public release and a call for privacy-centric improvements in vision-language models.
Abstract
Individuals who are blind or have low vision (BLV) are at a heightened risk of sharing private information if they share photographs they have taken. To facilitate developing technologies that can help them preserve privacy, we introduce BIV-Priv-Seg, the first localization dataset originating from people with visual impairments that shows private content. It contains 1,028 images with segmentation annotations for 16 private object categories. We first characterize BIV-Priv-Seg and then evaluate modern models' performance for locating private content in the dataset. We find modern models struggle most with locating private objects that are not salient, small, and lack text as well as recognizing when private content is absent from an image. We facilitate future extensions by sharing our new dataset with the evaluation server at https://vizwiz.org/tasks-and-datasets/object-localization.
