Table of Contents
Fetching ...

This Looks Like That: Deep Learning for Interpretable Image Recognition

Chaofan Chen, Oscar Li, Chaofan Tao, Alina Jade Barnett, Jonathan Su, Cynthia Rudin

TL;DR

ProtoPNet introduces a prototype-based deep network that enables interpretable image recognition through case-based reasoning by matching image parts to learned prototypes. The model is trained end-to-end with image-level labels using clustering, separation, and prototype-projection steps, and it visualizes decisions via prototypical part activations. Evaluations on CUB-200-2011 and Stanford Cars show competitive accuracy versus non-interpretable baselines, with further gains when ensembling multiple ProtoPNets, while providing explanations in terms of prototypical parts and their similarities. The work highlights a practical path to faithful, interpretable deep learning for fine-grained recognition, supported by visualization techniques and open-source code.

Abstract

When we are faced with challenging image classification tasks, we often explain our reasoning by dissecting the image, and pointing out prototypical aspects of one class or another. The mounting evidence for each of the classes helps us make our final decision. In this work, we introduce a deep network architecture -- prototypical part network (ProtoPNet), that reasons in a similar way: the network dissects the image by finding prototypical parts, and combines evidence from the prototypes to make a final classification. The model thus reasons in a way that is qualitatively similar to the way ornithologists, physicians, and others would explain to people on how to solve challenging image classification tasks. The network uses only image-level labels for training without any annotations for parts of images. We demonstrate our method on the CUB-200-2011 dataset and the Stanford Cars dataset. Our experiments show that ProtoPNet can achieve comparable accuracy with its analogous non-interpretable counterpart, and when several ProtoPNets are combined into a larger network, it can achieve an accuracy that is on par with some of the best-performing deep models. Moreover, ProtoPNet provides a level of interpretability that is absent in other interpretable deep models.

This Looks Like That: Deep Learning for Interpretable Image Recognition

TL;DR

ProtoPNet introduces a prototype-based deep network that enables interpretable image recognition through case-based reasoning by matching image parts to learned prototypes. The model is trained end-to-end with image-level labels using clustering, separation, and prototype-projection steps, and it visualizes decisions via prototypical part activations. Evaluations on CUB-200-2011 and Stanford Cars show competitive accuracy versus non-interpretable baselines, with further gains when ensembling multiple ProtoPNets, while providing explanations in terms of prototypical parts and their similarities. The work highlights a practical path to faithful, interpretable deep learning for fine-grained recognition, supported by visualization techniques and open-source code.

Abstract

When we are faced with challenging image classification tasks, we often explain our reasoning by dissecting the image, and pointing out prototypical aspects of one class or another. The mounting evidence for each of the classes helps us make our final decision. In this work, we introduce a deep network architecture -- prototypical part network (ProtoPNet), that reasons in a similar way: the network dissects the image by finding prototypical parts, and combines evidence from the prototypes to make a final classification. The model thus reasons in a way that is qualitatively similar to the way ornithologists, physicians, and others would explain to people on how to solve challenging image classification tasks. The network uses only image-level labels for training without any annotations for parts of images. We demonstrate our method on the CUB-200-2011 dataset and the Stanford Cars dataset. Our experiments show that ProtoPNet can achieve comparable accuracy with its analogous non-interpretable counterpart, and when several ProtoPNets are combined into a larger network, it can achieve an accuracy that is on par with some of the best-performing deep models. Moreover, ProtoPNet provides a level of interpretability that is absent in other interpretable deep models.

Paper Structure

This paper contains 10 sections, 1 theorem, 2 equations, 2 figures, 1 table.

Key Result

Theorem 2.1

Let $h \circ g_{\mathbf{p}} \circ f$ be a ProtoPNet. For each $k$, $l$, we use $\mathbf{b}^k_l$ to denote the value of the $l$-th prototype for class $k$before the projection of $\mathbf{p}^k_l$ to the nearest latent training patch of class $k$, and use $\mathbf{a}^k_l$ to denote its value after the

Figures (2)

  • Figure 1: Image of a clay colored sparrow and how parts of it look like some learned prototypical parts of a clay colored sparrow used to classify the bird's species.
  • Figure 2: ProtoPNet architecture.

Theorems & Definitions (1)

  • Theorem 2.1