Quick-CapsNet (QCN): A fast alternative to Capsule Networks
Pouya Shiri, Ramin Sharifi, Amirali Baniasadi
TL;DR
The paper tackles the slow inference of Capsule Networks by introducing Quick-CapsNet (QCN), which reduces the number of primary capsules by replacing the second convolution with a small Fully Connected layer, and constrains PCs to as few as 4–8. A stronger decoder is proposed as QCN+, using deconvolution layers to recover accuracy with fewer parameters. Across MNIST, F-MNIST, SVHN, CIFAR-10, and AffNIST, QCN achieves substantial speedups (up to about 10x during training and up to 7x during inference) with only marginal drops in accuracy, while QCN+ further improves reconstruction performance at the cost of some speed. The study also assesses robustness to affine transformations, finding that QCN preserves affine robustness with notable gains in speed. Overall, QCN offers a practical, real-time-capable alternative to CapsNet with meaningful reductions in parameters and computational load.
Abstract
The basic computational unit in Capsule Network (CapsNet) is a capsule (vs. neurons in Convolutional Neural Networks (CNNs)). A capsule is a set of neurons, which form a vector. CapsNet is used for supervised classification of data and has achieved state-of-the-art accuracy on MNIST digit recognition dataset, outperforming conventional CNNs in detecting overlapping digits. Moreover, CapsNet shows higher robustness towards affine transformation when compared to CNNs for MNIST datasets. One of the drawbacks of CapsNet, however, is slow training and testing. This can be a bottleneck for applications that require a fast network, especially during inference. In this work, we introduce Quick-CapsNet (QCN) as a fast alternative to CapsNet, which can be a starting point to develop CapsNet for fast real-time applications. QCN builds on producing a fewer number of capsules, which results in a faster network. QCN achieves this at the cost of marginal loss in accuracy. Inference is 5x faster on MNIST, F-MNIST, SVHN and Cifar-10 datasets. We also further enhanced QCN by employing a more powerful decoder instead of the default decoder to further improve QCN.
