Table of Contents
Fetching ...

Open-Vocabulary Object Detectors: Robustness Challenges under Distribution Shifts

Prakash Chandra Chhipa, Kanjar De, Meenakshi Subhash Chippa, Rajkumar Saini, Marcus Liwicki

TL;DR

This study presents a comprehensive robustness evaluation of the zero-shot capabilities of three recent open-vocabulary (OV) foundation object detection models: OWL-ViT, YOLO World, and Grounding DINO.

Abstract

The challenge of Out-Of-Distribution (OOD) robustness remains a critical hurdle towards deploying deep vision models. Vision-Language Models (VLMs) have recently achieved groundbreaking results. VLM-based open-vocabulary object detection extends the capabilities of traditional object detection frameworks, enabling the recognition and classification of objects beyond predefined categories. Investigating OOD robustness in recent open-vocabulary object detection is essential to increase the trustworthiness of these models. This study presents a comprehensive robustness evaluation of the zero-shot capabilities of three recent open-vocabulary (OV) foundation object detection models: OWL-ViT, YOLO World, and Grounding DINO. Experiments carried out on the robustness benchmarks COCO-O, COCO-DC, and COCO-C encompassing distribution shifts due to information loss, corruption, adversarial attacks, and geometrical deformation, highlighting the challenges of the model's robustness to foster the research for achieving robustness. Project page: https://prakashchhipa.github.io/projects/ovod_robustness

Open-Vocabulary Object Detectors: Robustness Challenges under Distribution Shifts

TL;DR

This study presents a comprehensive robustness evaluation of the zero-shot capabilities of three recent open-vocabulary (OV) foundation object detection models: OWL-ViT, YOLO World, and Grounding DINO.

Abstract

The challenge of Out-Of-Distribution (OOD) robustness remains a critical hurdle towards deploying deep vision models. Vision-Language Models (VLMs) have recently achieved groundbreaking results. VLM-based open-vocabulary object detection extends the capabilities of traditional object detection frameworks, enabling the recognition and classification of objects beyond predefined categories. Investigating OOD robustness in recent open-vocabulary object detection is essential to increase the trustworthiness of these models. This study presents a comprehensive robustness evaluation of the zero-shot capabilities of three recent open-vocabulary (OV) foundation object detection models: OWL-ViT, YOLO World, and Grounding DINO. Experiments carried out on the robustness benchmarks COCO-O, COCO-DC, and COCO-C encompassing distribution shifts due to information loss, corruption, adversarial attacks, and geometrical deformation, highlighting the challenges of the model's robustness to foster the research for achieving robustness. Project page: https://prakashchhipa.github.io/projects/ovod_robustness
Paper Structure (16 sections, 1 equation, 7 figures, 7 tables)

This paper contains 16 sections, 1 equation, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Zero-shot performance comparison for open vocabulary object detection models, OWL-ViT minderer2022simple(ECCV'22), YOLO World cheng2024yolow (CVPR'24), and Grounding DINO liu2023grounding (ECCV'24). COCO-O mao2023coco (ICCV'23) represents average results on six subsets, and COCO-C michaelis2019benchmarking represents average results on fifteen corruptions.
  • Figure 2: Zero-shot performance on COCO-DC: (left): comparison for OWL-ViT, YOLO World, and Grounding DINO on COCO-DC robustness performance on original subset and adversarial subset. (right): comparison for these foundation models on COCO-DC robustness performance on original subset and average of all remaining subsets.
  • Figure 3: Examples from six COCO-O benchmark subsets depicted with predictions by open-vocabulary models: OWL-ViT, YOLO World, and Grounding DINO. The input textual query includes the object categories identified in the labels.
  • Figure 4: Zero-shot evaluation process of open vocabulary object detector models.
  • Figure 5: Comparisons of effective robustness for detectors based on their performance on original COCO and COCO-O datasets.
  • ...and 2 more figures