Table of Contents
Fetching ...

Quick-CapsNet (QCN): A fast alternative to Capsule Networks

Pouya Shiri, Ramin Sharifi, Amirali Baniasadi

TL;DR

The paper tackles the slow inference of Capsule Networks by introducing Quick-CapsNet (QCN), which reduces the number of primary capsules by replacing the second convolution with a small Fully Connected layer, and constrains PCs to as few as 4–8. A stronger decoder is proposed as QCN+, using deconvolution layers to recover accuracy with fewer parameters. Across MNIST, F-MNIST, SVHN, CIFAR-10, and AffNIST, QCN achieves substantial speedups (up to about 10x during training and up to 7x during inference) with only marginal drops in accuracy, while QCN+ further improves reconstruction performance at the cost of some speed. The study also assesses robustness to affine transformations, finding that QCN preserves affine robustness with notable gains in speed. Overall, QCN offers a practical, real-time-capable alternative to CapsNet with meaningful reductions in parameters and computational load.

Abstract

The basic computational unit in Capsule Network (CapsNet) is a capsule (vs. neurons in Convolutional Neural Networks (CNNs)). A capsule is a set of neurons, which form a vector. CapsNet is used for supervised classification of data and has achieved state-of-the-art accuracy on MNIST digit recognition dataset, outperforming conventional CNNs in detecting overlapping digits. Moreover, CapsNet shows higher robustness towards affine transformation when compared to CNNs for MNIST datasets. One of the drawbacks of CapsNet, however, is slow training and testing. This can be a bottleneck for applications that require a fast network, especially during inference. In this work, we introduce Quick-CapsNet (QCN) as a fast alternative to CapsNet, which can be a starting point to develop CapsNet for fast real-time applications. QCN builds on producing a fewer number of capsules, which results in a faster network. QCN achieves this at the cost of marginal loss in accuracy. Inference is 5x faster on MNIST, F-MNIST, SVHN and Cifar-10 datasets. We also further enhanced QCN by employing a more powerful decoder instead of the default decoder to further improve QCN.

Quick-CapsNet (QCN): A fast alternative to Capsule Networks

TL;DR

The paper tackles the slow inference of Capsule Networks by introducing Quick-CapsNet (QCN), which reduces the number of primary capsules by replacing the second convolution with a small Fully Connected layer, and constrains PCs to as few as 4–8. A stronger decoder is proposed as QCN+, using deconvolution layers to recover accuracy with fewer parameters. Across MNIST, F-MNIST, SVHN, CIFAR-10, and AffNIST, QCN achieves substantial speedups (up to about 10x during training and up to 7x during inference) with only marginal drops in accuracy, while QCN+ further improves reconstruction performance at the cost of some speed. The study also assesses robustness to affine transformations, finding that QCN preserves affine robustness with notable gains in speed. Overall, QCN offers a practical, real-time-capable alternative to CapsNet with meaningful reductions in parameters and computational load.

Abstract

The basic computational unit in Capsule Network (CapsNet) is a capsule (vs. neurons in Convolutional Neural Networks (CNNs)). A capsule is a set of neurons, which form a vector. CapsNet is used for supervised classification of data and has achieved state-of-the-art accuracy on MNIST digit recognition dataset, outperforming conventional CNNs in detecting overlapping digits. Moreover, CapsNet shows higher robustness towards affine transformation when compared to CNNs for MNIST datasets. One of the drawbacks of CapsNet, however, is slow training and testing. This can be a bottleneck for applications that require a fast network, especially during inference. In this work, we introduce Quick-CapsNet (QCN) as a fast alternative to CapsNet, which can be a starting point to develop CapsNet for fast real-time applications. QCN builds on producing a fewer number of capsules, which results in a faster network. QCN achieves this at the cost of marginal loss in accuracy. Inference is 5x faster on MNIST, F-MNIST, SVHN and Cifar-10 datasets. We also further enhanced QCN by employing a more powerful decoder instead of the default decoder to further improve QCN.

Paper Structure

This paper contains 9 sections, 1 equation, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Exploring the effect of the number of PCs on the network speed for Cifar-10 dataset. Training time is divided by 10. Note that network becomes slower as the number of PCs increase.
  • Figure 2: CapsNet architecture. The input image goes through two convolution layers. Then and after being reshaped, it enters the primary capsule layer. At the end, the output vector will be the largest magnitude of the vectors present in digit caps.
  • Figure 3: CapsNet Reconstruction Sub-network Sabour2017. Images of CapsNet are reproduced to create a new term in the loss.
  • Figure 4: QCN architecture. Compared to the baseline, notice that the second Convolution layer is replaced with an FC layer.
  • Figure 5: Network training speed in QCN and QCN+ compared to the baseline. The training time is shown for 4 datasets. QCN is significantly faster in training compared to the baseline CapsNet. QCN+ is slower than QCN due to the use of deconvolution layers.
  • ...and 1 more figures