DL-CapsNet: A Deep and Light Capsule Network
Pouya Shiri, Amirali Baniasadi
TL;DR
<3-5 sentence high-level summary> DL-CapsNet tackles the trade-off between representation power and parameter efficiency in Capsule Networks by introducing a deep architecture that uses Multi-Level Capsule Extractor (MLCE) and Capsule Summarization (CapsSum). It combines 3D Dynamic Routing with a class-independent decoder to maintain accuracy while keeping the parameter count low (~6.8M) and enabling fast training and inference. The model achieves competitive results on CIFAR-10 and CIFAR-100, with 91.29% accuracy on CIFAR-10 using a 7-ensemble setup and 68.36% on CIFAR-100, demonstrating strong scalability to datasets with many classes. Overall, DL-CapsNet shows that deeper capsule architectures can be both efficient and effective for complex visual classification tasks.
Abstract
Capsule Network (CapsNet) is among the promising classifiers and a possible successor of the classifiers built based on Convolutional Neural Network (CNN). CapsNet is more accurate than CNNs in detecting images with overlapping categories and those with applied affine transformations. In this work, we propose a deep variant of CapsNet consisting of several capsule layers. In addition, we design the Capsule Summarization layer to reduce the complexity by reducing the number of parameters. DL-CapsNet, while being highly accurate, employs a small number of parameters and delivers faster training and inference. DL-CapsNet can process complex datasets with a high number of categories.
