Table of Contents
Fetching ...

BIV-Priv-Seg: Locating Private Content in Images Taken by People With Visual Impairments

Yu-Yun Tseng, Tanusree Sharma, Lotus Zhang, Abigale Stangl, Leah Findlater, Yang Wang, Danna Gurari

TL;DR

This work introduces BIV-Priv-Seg, the first localization dataset derived from people with visual impairments that provides segmentation for 16 private object categories across 1,028 images, with 967 instance segmentations. It assesses zero-shot and few-shot localization methods (DeFRCN, YOLACT) and vision-language grounding models (GLaMM, GroundingDINO+SAM), finding that current approaches still struggle with non-salient, small, or text-containing private objects and with detecting absence of private content. Global and localized analyses reveal distinctive dataset traits, such as high text prevalence and objects frequently near image borders, which influence model performance. The dataset and evaluation framework pave the way for privacy-aware grounding and localization tools for BLV users and broader contexts, with public release and a call for privacy-centric improvements in vision-language models.

Abstract

Individuals who are blind or have low vision (BLV) are at a heightened risk of sharing private information if they share photographs they have taken. To facilitate developing technologies that can help them preserve privacy, we introduce BIV-Priv-Seg, the first localization dataset originating from people with visual impairments that shows private content. It contains 1,028 images with segmentation annotations for 16 private object categories. We first characterize BIV-Priv-Seg and then evaluate modern models' performance for locating private content in the dataset. We find modern models struggle most with locating private objects that are not salient, small, and lack text as well as recognizing when private content is absent from an image. We facilitate future extensions by sharing our new dataset with the evaluation server at https://vizwiz.org/tasks-and-datasets/object-localization.

BIV-Priv-Seg: Locating Private Content in Images Taken by People With Visual Impairments

TL;DR

This work introduces BIV-Priv-Seg, the first localization dataset derived from people with visual impairments that provides segmentation for 16 private object categories across 1,028 images, with 967 instance segmentations. It assesses zero-shot and few-shot localization methods (DeFRCN, YOLACT) and vision-language grounding models (GLaMM, GroundingDINO+SAM), finding that current approaches still struggle with non-salient, small, or text-containing private objects and with detecting absence of private content. Global and localized analyses reveal distinctive dataset traits, such as high text prevalence and objects frequently near image borders, which influence model performance. The dataset and evaluation framework pave the way for privacy-aware grounding and localization tools for BLV users and broader contexts, with public release and a call for privacy-centric improvements in vision-language models.

Abstract

Individuals who are blind or have low vision (BLV) are at a heightened risk of sharing private information if they share photographs they have taken. To facilitate developing technologies that can help them preserve privacy, we introduce BIV-Priv-Seg, the first localization dataset originating from people with visual impairments that shows private content. It contains 1,028 images with segmentation annotations for 16 private object categories. We first characterize BIV-Priv-Seg and then evaluate modern models' performance for locating private content in the dataset. We find modern models struggle most with locating private objects that are not salient, small, and lack text as well as recognizing when private content is absent from an image. We facilitate future extensions by sharing our new dataset with the evaluation server at https://vizwiz.org/tasks-and-datasets/object-localization.
Paper Structure (27 sections, 5 figures, 6 tables)

This paper contains 27 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: We introduce BIV-Priv-Seg, a dataset containing segmented private objects. An example is shown for each category.
  • Figure 2: Examples from BIV-Priv-Seg dataset illustrating its unique aspects: (a) high proportion of image areas and borders occupied by the objects, (b) high prevalence of objects containing text, and (c) images lacking target objects.
  • Figure 3: Comparison of our dataset to four existing few-shot localization datasets with respect to the (a) number of annotated objects per image, (b) number of segments (i.e., disconnected areas) per annotated object, and (c) percentage of annotated objects containing text.
  • Figure 4: Comparison of our dataset to four existing few-shot localization datasets with respect to the (a) border rate, (b) center deviation, (c) image coverage, and (d) boundary complexity. PASCAL-5i and FSOD are excluded for boundary complexity as they lack the necessary segmentation annotations to calculate this metric.
  • Figure 5: Qualitative results from DeFRCN in the 1-shot setting for object detection, YOLACT in the 1-shot setting for instance segmentation and object detection, and GLaMM and GroundingDINO+SAM using the prompt I scenario for the zero-shot setting.