Table of Contents
Fetching ...

PartImageNet++ Dataset: Scaling up Part-based Models for Robust Recognition

Xiao Li, Yining Liu, Na Dong, Sitian Qin, Xiaolin Hu

TL;DR

The paper addresses robustness gaps in deep object recognition by introducing PIN++, a large-scale dataset with high-quality part annotations for all IN-1K categories, and a novel Multi-scale Part-supervised Model (MPM) that leverages these annotations. PIN++ enables generation of pseudo part labels for unlabeled IN-1K images, while MPM injects lightweight, multi-resolution part supervision into a standard backbone without increasing inference cost. Across adversarial attacks, common corruptions, and several OOD datasets, MPM improves both robustness and alignment with human vision, and it boosts downstream detection performance when used for backbone initialization. The work highlights the practical value of explicit part-based inductive bias for robust recognition and broader visual understanding tasks. Key contributions include the creation of PIN++, the pseudo-label driven training paradigm, and the MPM architecture that utilizes high-resolution part annotations to achieve robust recognition on a large-scale benchmark.

Abstract

Deep learning-based object recognition systems can be easily fooled by various adversarial perturbations. One reason for the weak robustness may be that they do not have part-based inductive bias like the human recognition process. Motivated by this, several part-based recognition models have been proposed to improve the adversarial robustness of recognition. However, due to the lack of part annotations, the effectiveness of these methods is only validated on small-scale nonstandard datasets. In this work, we propose PIN++, short for PartImageNet++, a dataset providing high-quality part segmentation annotations for all categories of ImageNet-1K (IN-1K). With these annotations, we build part-based methods directly on the standard IN-1K dataset for robust recognition. Different from previous two-stage part-based models, we propose a Multi-scale Part-supervised Model (MPM), to learn a robust representation with part annotations. Experiments show that MPM yielded better adversarial robustness on the large-scale IN-1K over strong baselines across various attack settings. Furthermore, MPM achieved improved robustness on common corruptions and several out-of-distribution datasets. The dataset, together with these results, enables and encourages researchers to explore the potential of part-based models in more real applications.

PartImageNet++ Dataset: Scaling up Part-based Models for Robust Recognition

TL;DR

The paper addresses robustness gaps in deep object recognition by introducing PIN++, a large-scale dataset with high-quality part annotations for all IN-1K categories, and a novel Multi-scale Part-supervised Model (MPM) that leverages these annotations. PIN++ enables generation of pseudo part labels for unlabeled IN-1K images, while MPM injects lightweight, multi-resolution part supervision into a standard backbone without increasing inference cost. Across adversarial attacks, common corruptions, and several OOD datasets, MPM improves both robustness and alignment with human vision, and it boosts downstream detection performance when used for backbone initialization. The work highlights the practical value of explicit part-based inductive bias for robust recognition and broader visual understanding tasks. Key contributions include the creation of PIN++, the pseudo-label driven training paradigm, and the MPM architecture that utilizes high-resolution part annotations to achieve robust recognition on a large-scale benchmark.

Abstract

Deep learning-based object recognition systems can be easily fooled by various adversarial perturbations. One reason for the weak robustness may be that they do not have part-based inductive bias like the human recognition process. Motivated by this, several part-based recognition models have been proposed to improve the adversarial robustness of recognition. However, due to the lack of part annotations, the effectiveness of these methods is only validated on small-scale nonstandard datasets. In this work, we propose PIN++, short for PartImageNet++, a dataset providing high-quality part segmentation annotations for all categories of ImageNet-1K (IN-1K). With these annotations, we build part-based methods directly on the standard IN-1K dataset for robust recognition. Different from previous two-stage part-based models, we propose a Multi-scale Part-supervised Model (MPM), to learn a robust representation with part annotations. Experiments show that MPM yielded better adversarial robustness on the large-scale IN-1K over strong baselines across various attack settings. Furthermore, MPM achieved improved robustness on common corruptions and several out-of-distribution datasets. The dataset, together with these results, enables and encourages researchers to explore the potential of part-based models in more real applications.
Paper Structure (30 sections, 6 figures, 13 tables)

This paper contains 30 sections, 6 figures, 13 tables.

Figures (6)

  • Figure 1: Examples of annotated images in PIN++. High-quality part segmentation annotations are provided on all categories in IN-1K. The object names are shown on the top-right of each image. The part names are hidden here.
  • Figure 2: Comparison between part segmentation results of different methods and PIN++ annotations. Without training on PIN++, VLPart and SAM fail to segment objects into specific parts with accurate semantics.
  • Figure 3: An overview of the generation of pseudo-labels and the structure of MPM. (a) A part segmentation model trained on PIN++ and used to obtain pseudo-part labels for unannotated images. (b) MPM adds several auxiliary bypass layers to the vanilla recognition model for part segmentation supervision. MPM is trained by part annotations together with the pseudo part labels. During inference, the auxiliary layers are dropped, and the vanilla recognition model gives the final object category prediction.
  • Figure S1: Visual comparison between annotations of PIN and PIN++. The object names are shown on the top-right of the IN-1K columns. The part names are shown in the PIN and PIN++ images.
  • Figure S2: Visualization of pseudo part labels generated by a Mask R-CNN trained on PIN++. The object names are shown on the top-right of each image. The part names are hidden here for clarity.
  • ...and 1 more figures