Table of Contents
Fetching ...

Robustness Analysis on Foundational Segmentation Models

Madeline Chantry Schiappa, Shehreen Azad, Sachidanand VS, Yunhao Ge, Ondrej Miksik, Yogesh S. Rawat, Vibhav Vineet

TL;DR

This work benchmarks the robustness of seven segmentation models, including multimodal Visual Foundation Models, against real-world distribution shifts using 17 perturbations across MS COCO-P and ADE20K-P. It introduces the relative robustness $ abla^r$ and absolute robustness $ abla^a$ metrics and reveals that VFMs are not uniformly more robust than unimodal baselines, though they can offer competitive zero-shot performance and object-category-specific resilience. The study provides two perturbation-based datasets and a comprehensive analysis of how texture-preserving versus texture-distorting corruptions impact performance, offering insights and a benchmark to guide future robustness-aware development of segmentation foundation models. The findings underscore the need to bolster compression/blur robustness and to understand the trade-offs between robustness and standard segmentation accuracy in practical deployments.

Abstract

Due to the increase in computational resources and accessibility of data, an increase in large, deep learning models trained on copious amounts of multi-modal data using self-supervised or semi-supervised learning have emerged. These ``foundation'' models are often adapted to a variety of downstream tasks like classification, object detection, and segmentation with little-to-no training on the target dataset. In this work, we perform a robustness analysis of Visual Foundation Models (VFMs) for segmentation tasks and focus on robustness against real-world distribution shift inspired perturbations. We benchmark seven state-of-the-art segmentation architectures using 2 different perturbed datasets, MS COCO-P and ADE20K-P, with 17 different perturbations with 5 severity levels each. Our findings reveal several key insights: (1) VFMs exhibit vulnerabilities to compression-induced corruptions, (2) despite not outpacing all of unimodal models in robustness, multimodal models show competitive resilience in zero-shot scenarios, and (3) VFMs demonstrate enhanced robustness for certain object categories. These observations suggest that our robustness evaluation framework sets new requirements for foundational models, encouraging further advancements to bolster their adaptability and performance. The code and dataset is available at: \url{https://tinyurl.com/fm-robust}.

Robustness Analysis on Foundational Segmentation Models

TL;DR

This work benchmarks the robustness of seven segmentation models, including multimodal Visual Foundation Models, against real-world distribution shifts using 17 perturbations across MS COCO-P and ADE20K-P. It introduces the relative robustness and absolute robustness metrics and reveals that VFMs are not uniformly more robust than unimodal baselines, though they can offer competitive zero-shot performance and object-category-specific resilience. The study provides two perturbation-based datasets and a comprehensive analysis of how texture-preserving versus texture-distorting corruptions impact performance, offering insights and a benchmark to guide future robustness-aware development of segmentation foundation models. The findings underscore the need to bolster compression/blur robustness and to understand the trade-offs between robustness and standard segmentation accuracy in practical deployments.

Abstract

Due to the increase in computational resources and accessibility of data, an increase in large, deep learning models trained on copious amounts of multi-modal data using self-supervised or semi-supervised learning have emerged. These ``foundation'' models are often adapted to a variety of downstream tasks like classification, object detection, and segmentation with little-to-no training on the target dataset. In this work, we perform a robustness analysis of Visual Foundation Models (VFMs) for segmentation tasks and focus on robustness against real-world distribution shift inspired perturbations. We benchmark seven state-of-the-art segmentation architectures using 2 different perturbed datasets, MS COCO-P and ADE20K-P, with 17 different perturbations with 5 severity levels each. Our findings reveal several key insights: (1) VFMs exhibit vulnerabilities to compression-induced corruptions, (2) despite not outpacing all of unimodal models in robustness, multimodal models show competitive resilience in zero-shot scenarios, and (3) VFMs demonstrate enhanced robustness for certain object categories. These observations suggest that our robustness evaluation framework sets new requirements for foundational models, encouraging further advancements to bolster their adaptability and performance. The code and dataset is available at: \url{https://tinyurl.com/fm-robust}.
Paper Structure (18 sections, 1 equation, 20 figures, 9 tables)

This paper contains 18 sections, 1 equation, 20 figures, 9 tables.

Figures (20)

  • Figure 1: Data perturbation examples where original sample is zoomed in to show different corruptions on image from the MS COCO-P dataset. Each image pair is of corruption at severity 3 and 5. Top row shows corruptions in the category of gaussian noise and darkness, whereas, bottom row shows fog and snow.
  • Figure 1: Results for each corruption and each severity for instance segmentation measured by average precision (AP) on the MS COCO-P dataset. x-axis: Severity ranges from 0 (no corruption) to 5 (most corruption). y-axis: AP results for instance segmentation.
  • Figure 2: Relative robustness score $\gamma^r$ on instance segmentation for the MS COCO-P dataset. Here the Y-axis denotes the models we evaluated and the x-axis denotes $\gamma^r$ for each corruption averaged over severity.
  • Figure 2: Results for each corruption and each severity for instance segmentation measured by average precision (AP) on the ADE20K-P dataset. x-axis: Severity ranges from 0 (no corruption) to 5 (most corruption). y-axis: AP results for instance segmentation.
  • Figure 3: Relative robustness score $\gamma^r$ on semantic segmentation on ADE20K-P. Here, the Y-axis denotes the models we evaluated and x-axis denotes $\gamma^r$ for each corruption averaged over severity.
  • ...and 15 more figures