Model Guidance via Explanations Turns Image Classifiers into Segmentation Models
Xiaoyan Yu, Jannik Franzen, Wojciech Samek, Marina M. -C. Höhne, Dagmar Kainmueller
TL;DR
Heatmaps from explainable AI methods often resemble segmentation maps, enabling weakly supervised segmentation with image-level labels. The authors unify heatmap-based guidance with standard segmentation by unrolling Layer-wise Relevance Propagation (LRP) into an encoder–decoder architecture with weight tying and class-specific decoders, trained with standard segmentation losses. They show formal parallels to conventional segmentation networks and demonstrate competitive performance on PASCAL VOC 2012 across backbones, with notable gains when pixel-level labels are scarce. The approach preserves classifier performance while leveraging image-level labels to improve segmentation, offering a practical path for semi-supervised segmentation with readily pluggable training objectives and code availability.
Abstract
Heatmaps generated on inputs of image classification networks via explainable AI methods like Grad-CAM and LRP have been observed to resemble segmentations of input images in many cases. Consequently, heatmaps have also been leveraged for achieving weakly supervised segmentation with image-level supervision. On the other hand, losses can be imposed on differentiable heatmaps, which has been shown to serve for (1)~improving heatmaps to be more human-interpretable, (2)~regularization of networks towards better generalization, (3)~training diverse ensembles of networks, and (4)~for explicitly ignoring confounding input features. Due to the latter use case, the paradigm of imposing losses on heatmaps is often referred to as "Right for the right reasons". We unify these two lines of research by investigating semi-supervised segmentation as a novel use case for the Right for the Right Reasons paradigm. First, we show formal parallels between differentiable heatmap architectures and standard encoder-decoder architectures for image segmentation. Second, we show that such differentiable heatmap architectures yield competitive results when trained with standard segmentation losses. Third, we show that such architectures allow for training with weak supervision in the form of image-level labels and small numbers of pixel-level labels, outperforming comparable encoder-decoder models. Code is available at \url{https://github.com/Kainmueller-Lab/TW-autoencoder}.
