LE-CapsNet: A Light and Enhanced Capsule Network
Pouya Shiri, Amirali Baniasadi
TL;DR
CapsNet offers viewpoint-invariant object representations but suffers from slow routing and high parameter counts, limiting scalability. LE-CapsNet introduces a Primary Capsule Generator (PCG) with multi-scale feature extraction, a Convolutional Fully Connected (CFC) layer to translate features into capsules, a stronger class-independent decoder, and capsule dropout to boost generalization. It achieves higher accuracy with far fewer parameters (3.8M) and substantially faster inference (about 4x) than CapsNet, e.g., $76.73\%$ on CIFAR-10 and $94.37\%$ on AffNIST, while maintaining competitive SVHN and Fashion-MNIST results. Overall, LE-CapsNet offers a scalable, efficient alternative to CapsNet for image classification, combining speed, accuracy, and robustness benefits across common datasets.
Abstract
Capsule Network (CapsNet) classifier has several advantages over CNNs, including better detection of images containing overlapping categories and higher accuracy on transformed images. Despite the advantages, CapsNet is slow due to its different structure. In addition, CapsNet is resource-hungry, includes many parameters and lags in accuracy compared to CNNs. In this work, we propose LE-CapsNet as a light, enhanced and more accurate variant of CapsNet. Using 3.8M weights, LECapsNet obtains 76.73% accuracy on the CIFAR-10 dataset while performing inference 4x faster than CapsNet. In addition, our proposed network is more robust at detecting images with affine transformations compared to CapsNet. We achieve 94.3% accuracy on the AffNIST dataset (compared to CapsNet 90.52%).
