Robustness Analysis on Foundational Segmentation Models
Madeline Chantry Schiappa, Shehreen Azad, Sachidanand VS, Yunhao Ge, Ondrej Miksik, Yogesh S. Rawat, Vibhav Vineet
TL;DR
This work benchmarks the robustness of seven segmentation models, including multimodal Visual Foundation Models, against real-world distribution shifts using 17 perturbations across MS COCO-P and ADE20K-P. It introduces the relative robustness $ abla^r$ and absolute robustness $ abla^a$ metrics and reveals that VFMs are not uniformly more robust than unimodal baselines, though they can offer competitive zero-shot performance and object-category-specific resilience. The study provides two perturbation-based datasets and a comprehensive analysis of how texture-preserving versus texture-distorting corruptions impact performance, offering insights and a benchmark to guide future robustness-aware development of segmentation foundation models. The findings underscore the need to bolster compression/blur robustness and to understand the trade-offs between robustness and standard segmentation accuracy in practical deployments.
Abstract
Due to the increase in computational resources and accessibility of data, an increase in large, deep learning models trained on copious amounts of multi-modal data using self-supervised or semi-supervised learning have emerged. These ``foundation'' models are often adapted to a variety of downstream tasks like classification, object detection, and segmentation with little-to-no training on the target dataset. In this work, we perform a robustness analysis of Visual Foundation Models (VFMs) for segmentation tasks and focus on robustness against real-world distribution shift inspired perturbations. We benchmark seven state-of-the-art segmentation architectures using 2 different perturbed datasets, MS COCO-P and ADE20K-P, with 17 different perturbations with 5 severity levels each. Our findings reveal several key insights: (1) VFMs exhibit vulnerabilities to compression-induced corruptions, (2) despite not outpacing all of unimodal models in robustness, multimodal models show competitive resilience in zero-shot scenarios, and (3) VFMs demonstrate enhanced robustness for certain object categories. These observations suggest that our robustness evaluation framework sets new requirements for foundational models, encouraging further advancements to bolster their adaptability and performance. The code and dataset is available at: \url{https://tinyurl.com/fm-robust}.
