Table of Contents
Fetching ...

LE-CapsNet: A Light and Enhanced Capsule Network

Pouya Shiri, Amirali Baniasadi

TL;DR

CapsNet offers viewpoint-invariant object representations but suffers from slow routing and high parameter counts, limiting scalability. LE-CapsNet introduces a Primary Capsule Generator (PCG) with multi-scale feature extraction, a Convolutional Fully Connected (CFC) layer to translate features into capsules, a stronger class-independent decoder, and capsule dropout to boost generalization. It achieves higher accuracy with far fewer parameters (3.8M) and substantially faster inference (about 4x) than CapsNet, e.g., $76.73\%$ on CIFAR-10 and $94.37\%$ on AffNIST, while maintaining competitive SVHN and Fashion-MNIST results. Overall, LE-CapsNet offers a scalable, efficient alternative to CapsNet for image classification, combining speed, accuracy, and robustness benefits across common datasets.

Abstract

Capsule Network (CapsNet) classifier has several advantages over CNNs, including better detection of images containing overlapping categories and higher accuracy on transformed images. Despite the advantages, CapsNet is slow due to its different structure. In addition, CapsNet is resource-hungry, includes many parameters and lags in accuracy compared to CNNs. In this work, we propose LE-CapsNet as a light, enhanced and more accurate variant of CapsNet. Using 3.8M weights, LECapsNet obtains 76.73% accuracy on the CIFAR-10 dataset while performing inference 4x faster than CapsNet. In addition, our proposed network is more robust at detecting images with affine transformations compared to CapsNet. We achieve 94.3% accuracy on the AffNIST dataset (compared to CapsNet 90.52%).

LE-CapsNet: A Light and Enhanced Capsule Network

TL;DR

CapsNet offers viewpoint-invariant object representations but suffers from slow routing and high parameter counts, limiting scalability. LE-CapsNet introduces a Primary Capsule Generator (PCG) with multi-scale feature extraction, a Convolutional Fully Connected (CFC) layer to translate features into capsules, a stronger class-independent decoder, and capsule dropout to boost generalization. It achieves higher accuracy with far fewer parameters (3.8M) and substantially faster inference (about 4x) than CapsNet, e.g., on CIFAR-10 and on AffNIST, while maintaining competitive SVHN and Fashion-MNIST results. Overall, LE-CapsNet offers a scalable, efficient alternative to CapsNet for image classification, combining speed, accuracy, and robustness benefits across common datasets.

Abstract

Capsule Network (CapsNet) classifier has several advantages over CNNs, including better detection of images containing overlapping categories and higher accuracy on transformed images. Despite the advantages, CapsNet is slow due to its different structure. In addition, CapsNet is resource-hungry, includes many parameters and lags in accuracy compared to CNNs. In this work, we propose LE-CapsNet as a light, enhanced and more accurate variant of CapsNet. Using 3.8M weights, LECapsNet obtains 76.73% accuracy on the CIFAR-10 dataset while performing inference 4x faster than CapsNet. In addition, our proposed network is more robust at detecting images with affine transformations compared to CapsNet. We achieve 94.3% accuracy on the AffNIST dataset (compared to CapsNet 90.52%).

Paper Structure

This paper contains 16 sections, 1 equation, 8 figures, 3 tables.

Figures (8)

  • Figure 1: The CapsNet's architecture. Vectors are formed by reshaping the extracted features from the feature extractor, then multiplied with a matrix to create PCs. The DR algorithm infers the output capsules by which the classification is performed. The decoder reconstructs the input image.
  • Figure 2: CapsNet's Decoder. Consecutive FC layers are used to reconstruct the input image. All output capsules except for the correct one are masked out with zeroes.
  • Figure 3: Class-Independent Decoder for CapsNet
  • Figure 4: LE-CapsNet architecture. The network uses a PC Generator module to generate capsules out of the input image. This module produces 90% fewer capsules (108 vs. 1152) on the MNIST dataset. This reduction leads to a faster and lighter (in terms of number of parameters) network.
  • Figure 5: The CFC layer. For each stride, vectors are formed from spatially correlated regions of the output feature map.
  • ...and 3 more figures