Towards the Characterization of Representations Learned via Capsule-based Network Architectures
Saja Tawalbeh, José Oramas
TL;DR
The paper tackles the interpretability of Capsule Networks by proposing a principled framework to test whether part-whole relationships are encoded in CapsNet representations. It introduces two methods—Perturbation Analysis and Layer-Wise Relevant Unit Selection—to probe internal activations and activation paths from input to prediction, applying them across MNIST, SVHN, CIFAR-10, and CelebA variants with CapsNetSF17 and CapsNetEM backbones. The study finds that CapsNet representations are not as disentangled nor strictly aligned with part-whole structures as often claimed, with activation-space perturbations revealing entangled feature dimensions and low overlap between part and whole activations measured by Relevance Mass Accuracy (RMA). These results highlight both the potential and the limitations of CapsNets for interpretable representation learning, and point to future work on routing efficiency, broader backbone evaluation, and complementary explanation methods to better understandCapsNet behavior in complex settings.
Abstract
Capsule Networks (CapsNets) have been re-introduced as a more compact and interpretable alternative to standard deep neural networks. While recent efforts have proved their compression capabilities, to date, their interpretability properties have not been fully assessed. Here, we conduct a systematic and principled study towards assessing the interpretability of these types of networks. Moreover, we pay special attention towards analyzing the level to which part-whole relationships are indeed encoded within the learned representation. Our analysis in the MNIST, SVHN, PASCAL-part and CelebA datasets suggest that the representations encoded in CapsNets might not be as disentangled nor strictly related to parts-whole relationships as is commonly stated in the literature.
