Table of Contents
Fetching ...

ProtoSeg: Interpretable Semantic Segmentation with Prototypical Parts

Mikołaj Sacha, Dawid Rymarczyk, Łukasz Struski, Jacek Tabor, Bartosz Zieliński

TL;DR

ProtoSeg reframes semantic segmentation as a prototype-driven task, where pixel predictions are grounded in prototypical patches from the training data. A novel prototype diversity loss based on Jeffrey's divergence encourages same-class prototypes to cover diverse semantic concepts, enhancing interpretability. The approach, compatible with multiple backbones and validated on Pascal VOC and Cityscapes, yields competitive accuracy while increasing transparency through explicit prototype activations. This work advances explainable segmentation by delivering human-understandable, patch-based explanations without requiring external annotation or post hoc reasoning.

Abstract

We introduce ProtoSeg, a novel model for interpretable semantic image segmentation, which constructs its predictions using similar patches from the training set. To achieve accuracy comparable to baseline methods, we adapt the mechanism of prototypical parts and introduce a diversity loss function that increases the variety of prototypes within each class. We show that ProtoSeg discovers semantic concepts, in contrast to standard segmentation models. Experiments conducted on Pascal VOC and Cityscapes datasets confirm the precision and transparency of the presented method.

ProtoSeg: Interpretable Semantic Segmentation with Prototypical Parts

TL;DR

ProtoSeg reframes semantic segmentation as a prototype-driven task, where pixel predictions are grounded in prototypical patches from the training data. A novel prototype diversity loss based on Jeffrey's divergence encourages same-class prototypes to cover diverse semantic concepts, enhancing interpretability. The approach, compatible with multiple backbones and validated on Pascal VOC and Cityscapes, yields competitive accuracy while increasing transparency through explicit prototype activations. This work advances explainable segmentation by delivering human-understandable, patch-based explanations without requiring external annotation or post hoc reasoning.

Abstract

We introduce ProtoSeg, a novel model for interpretable semantic image segmentation, which constructs its predictions using similar patches from the training set. To achieve accuracy comparable to baseline methods, we adapt the mechanism of prototypical parts and introduce a diversity loss function that increases the variety of prototypes within each class. We show that ProtoSeg discovers semantic concepts, in contrast to standard segmentation models. Experiments conducted on Pascal VOC and Cityscapes datasets confirm the precision and transparency of the presented method.
Paper Structure (21 sections, 8 equations, 8 figures, 4 tables)

This paper contains 21 sections, 8 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: In contrast to existing methods, ProtoSeg provides an interpretation of resulted segmentation. For this purpose, it operates on patches selected from a training set (prototypes) corresponding to parts of the segmented objects. For a bus, prototypes can correspond to windows or wheels, represented by red and orange colors, respectively.
  • Figure 2: Prototype activation maps generated by ProtoSeg for four prototypes from class cat (columns) and three sample images from PASCAL VOC 2012 (rows). Maps differ from each other, e.g. prototype 1 concentrates on the cat's nose, while prototype 4 activates mostly on the cat's neck. We see that ProtoSeg can derive semantic concepts using prototypical cases from the training dataset.
  • Figure 3: ProtoSeg consists of a backbone network $f$, prototype layer $g$, and a fully connected layer $h$. While the backbone network processes the image as a whole, the prototype and fully connected layers consider each $z$ from feature map $f(x)$ separately. The final segmentation is obtained by interpolating the output map corresponding to class probability.
  • Figure 4: Comparison between high and low values of $\mathcal{L}_{\hbox{\scriptsize J}}$ for the activation of two prototypes. $\mathcal{L}_{\hbox{\scriptsize J}}$ has a high value if two prototypes of the same class activate in the same area (a). For this reason, we add $\mathcal{L}_{\hbox{\scriptsize J}}$ as an additional component of the loss function to increase the variety of prototypes within each class (b).
  • Figure 5: Histograms showing the assignment of feature map points to prototypes per class on Cityscapes. As the assignment, we understand finding the highest activated prototype for a given feature map point. In the top row, we present the model trained with the diversity loss $\mathcal{L}_{\hbox{\scriptsize J}}$, while at the bottom without $\mathcal{L}_{\hbox{\scriptsize J}}$. One can observe that the diversity loss increases the utilization of prototypes by the ProtoSeg.
  • ...and 3 more figures