Deep multi-prototype capsule networks
Saeid Abbassi, Kamaledin Ghiasi-Shirazi, Ahad Harati
TL;DR
This work tackles the limited capacity of traditional capsule networks to handle high intra-class and intra-part variation and depth by introducing GMP-CapsNet, a deep multi-prototype capsule architecture. It represents each class and image part with multiple co-group capsules, enables depth through implicit weight-sharing, and feeds DenseNet-derived features into the capsule stack. The approach yields strong empirical gains across MNIST, SVHN, C-Cube, CEDAR, MCYT, and UTSig, with statistically significant improvements over standard CapsNet and state-of-the-art signature-recognition methods. The results demonstrate the effectiveness of part- and class-level multi-prototyping for robust, scalable capsule-based models in visually diverse data domains.
Abstract
Capsule networks are a type of neural network that identify image parts and form the instantiation parameters of a whole hierarchically. The goal behind the network is to perform an inverse computer graphics task, and the network parameters are the mapping weights that transform parts into a whole. The trainability of capsule networks in complex data with high intra-class or intra-part variation is challenging. This paper presents a multi-prototype architecture for guiding capsule networks to represent the variations in the image parts. To this end, instead of considering a single capsule for each class and part, the proposed method employs several capsules (co-group capsules), capturing multiple prototypes of an object. In the final layer, co-group capsules compete, and their soft output is considered the target for a competitive cross-entropy loss. Moreover, in the middle layers, the most active capsules map to the next layer with a shared weight among the co-groups. Consequently, due to the reduction in parameters, implicit weight-sharing makes it possible to have more deep capsule network layers. The experimental results on MNIST, SVHN, C-Cube, CEDAR, MCYT, and UTSig datasets reveal that the proposed model outperforms others regarding image classification accuracy.
