SPIN: Hierarchical Segmentation with Subpart Granularity in Natural Images
Josh Myers-Dean, Jarek Reynolds, Brian Price, Yifei Fan, Danna Gurari
TL;DR
SPIN introduces SubPartImageNet, the first natural-image dataset with exhaustive subpart annotations across 203 subpart categories, enabling subpart granularity in hierarchical segmentation. It also proposes two evaluation metrics, Spatial Consistency Score $SpCS$ and Semantic Consistency Score $SeCS$, to quantify cross-level spatial containment and semantic entailment across object–part–subpart hierarchies, supplementing traditional IoU measures. Through comprehensive benchmarking of open-vocabulary localization, interactive segmentation, and zero-shot semantic recognition, the paper shows substantial gaps in subpart understanding, with notable gains when training on SPIN data and strong cross-level containment in some models. The work provides a public dataset and a framework to push progress in fine-grained hierarchical segmentation with practical implications for captioning, visual QA, AR, and accessibility.
Abstract
Hierarchical segmentation entails creating segmentations at varying levels of granularity. We introduce the first hierarchical semantic segmentation dataset with subpart annotations for natural images, which we call SPIN (SubPartImageNet). We also introduce two novel evaluation metrics to evaluate how well algorithms capture spatial and semantic relationships across hierarchical levels. We benchmark modern models across three different tasks and analyze their strengths and weaknesses across objects, parts, and subparts. To facilitate community-wide progress, we publicly release our dataset at https://joshmyersdean.github.io/spin/index.html.
