Table of Contents
Fetching ...

Deep multi-prototype capsule networks

Saeid Abbassi, Kamaledin Ghiasi-Shirazi, Ahad Harati

TL;DR

This work tackles the limited capacity of traditional capsule networks to handle high intra-class and intra-part variation and depth by introducing GMP-CapsNet, a deep multi-prototype capsule architecture. It represents each class and image part with multiple co-group capsules, enables depth through implicit weight-sharing, and feeds DenseNet-derived features into the capsule stack. The approach yields strong empirical gains across MNIST, SVHN, C-Cube, CEDAR, MCYT, and UTSig, with statistically significant improvements over standard CapsNet and state-of-the-art signature-recognition methods. The results demonstrate the effectiveness of part- and class-level multi-prototyping for robust, scalable capsule-based models in visually diverse data domains.

Abstract

Capsule networks are a type of neural network that identify image parts and form the instantiation parameters of a whole hierarchically. The goal behind the network is to perform an inverse computer graphics task, and the network parameters are the mapping weights that transform parts into a whole. The trainability of capsule networks in complex data with high intra-class or intra-part variation is challenging. This paper presents a multi-prototype architecture for guiding capsule networks to represent the variations in the image parts. To this end, instead of considering a single capsule for each class and part, the proposed method employs several capsules (co-group capsules), capturing multiple prototypes of an object. In the final layer, co-group capsules compete, and their soft output is considered the target for a competitive cross-entropy loss. Moreover, in the middle layers, the most active capsules map to the next layer with a shared weight among the co-groups. Consequently, due to the reduction in parameters, implicit weight-sharing makes it possible to have more deep capsule network layers. The experimental results on MNIST, SVHN, C-Cube, CEDAR, MCYT, and UTSig datasets reveal that the proposed model outperforms others regarding image classification accuracy.

Deep multi-prototype capsule networks

TL;DR

This work tackles the limited capacity of traditional capsule networks to handle high intra-class and intra-part variation and depth by introducing GMP-CapsNet, a deep multi-prototype capsule architecture. It represents each class and image part with multiple co-group capsules, enables depth through implicit weight-sharing, and feeds DenseNet-derived features into the capsule stack. The approach yields strong empirical gains across MNIST, SVHN, C-Cube, CEDAR, MCYT, and UTSig, with statistically significant improvements over standard CapsNet and state-of-the-art signature-recognition methods. The results demonstrate the effectiveness of part- and class-level multi-prototyping for robust, scalable capsule-based models in visually diverse data domains.

Abstract

Capsule networks are a type of neural network that identify image parts and form the instantiation parameters of a whole hierarchically. The goal behind the network is to perform an inverse computer graphics task, and the network parameters are the mapping weights that transform parts into a whole. The trainability of capsule networks in complex data with high intra-class or intra-part variation is challenging. This paper presents a multi-prototype architecture for guiding capsule networks to represent the variations in the image parts. To this end, instead of considering a single capsule for each class and part, the proposed method employs several capsules (co-group capsules), capturing multiple prototypes of an object. In the final layer, co-group capsules compete, and their soft output is considered the target for a competitive cross-entropy loss. Moreover, in the middle layers, the most active capsules map to the next layer with a shared weight among the co-groups. Consequently, due to the reduction in parameters, implicit weight-sharing makes it possible to have more deep capsule network layers. The experimental results on MNIST, SVHN, C-Cube, CEDAR, MCYT, and UTSig datasets reveal that the proposed model outperforms others regarding image classification accuracy.
Paper Structure (11 sections, 11 equations, 5 figures, 8 tables)

This paper contains 11 sections, 11 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Examples of digits 2 and 3 in the MNIST dataset. The prototype variations for each class as well as each part, are indicated in this figure.
  • Figure 2: Hybrid DenseNet and multi-prototype deep capsule network architecture. In this architecture, the input of the proposed capsule network is supplied by the DenseNet features.
  • Figure 3: Synthetic face and noise dataset. The eye prototypes and other parts are depicted in the face class.
  • Figure 4: Illustration of some face samples after dropout to prevent the network from focusing on a specific part of the image
  • Figure 5: The results of the average eye prototype for the face class images in each of the two identified prototypes. According to these results, the multi-prototype network tends to learn different image part prototypes in the middle layer of the multi-prototype