Table of Contents
Fetching ...

DL-CapsNet: A Deep and Light Capsule Network

Pouya Shiri, Amirali Baniasadi

TL;DR

<3-5 sentence high-level summary> DL-CapsNet tackles the trade-off between representation power and parameter efficiency in Capsule Networks by introducing a deep architecture that uses Multi-Level Capsule Extractor (MLCE) and Capsule Summarization (CapsSum). It combines 3D Dynamic Routing with a class-independent decoder to maintain accuracy while keeping the parameter count low (~6.8M) and enabling fast training and inference. The model achieves competitive results on CIFAR-10 and CIFAR-100, with 91.29% accuracy on CIFAR-10 using a 7-ensemble setup and 68.36% on CIFAR-100, demonstrating strong scalability to datasets with many classes. Overall, DL-CapsNet shows that deeper capsule architectures can be both efficient and effective for complex visual classification tasks.

Abstract

Capsule Network (CapsNet) is among the promising classifiers and a possible successor of the classifiers built based on Convolutional Neural Network (CNN). CapsNet is more accurate than CNNs in detecting images with overlapping categories and those with applied affine transformations. In this work, we propose a deep variant of CapsNet consisting of several capsule layers. In addition, we design the Capsule Summarization layer to reduce the complexity by reducing the number of parameters. DL-CapsNet, while being highly accurate, employs a small number of parameters and delivers faster training and inference. DL-CapsNet can process complex datasets with a high number of categories.

DL-CapsNet: A Deep and Light Capsule Network

TL;DR

<3-5 sentence high-level summary> DL-CapsNet tackles the trade-off between representation power and parameter efficiency in Capsule Networks by introducing a deep architecture that uses Multi-Level Capsule Extractor (MLCE) and Capsule Summarization (CapsSum). It combines 3D Dynamic Routing with a class-independent decoder to maintain accuracy while keeping the parameter count low (~6.8M) and enabling fast training and inference. The model achieves competitive results on CIFAR-10 and CIFAR-100, with 91.29% accuracy on CIFAR-10 using a 7-ensemble setup and 68.36% on CIFAR-100, demonstrating strong scalability to datasets with many classes. Overall, DL-CapsNet shows that deeper capsule architectures can be both efficient and effective for complex visual classification tasks.

Abstract

Capsule Network (CapsNet) is among the promising classifiers and a possible successor of the classifiers built based on Convolutional Neural Network (CNN). CapsNet is more accurate than CNNs in detecting images with overlapping categories and those with applied affine transformations. In this work, we propose a deep variant of CapsNet consisting of several capsule layers. In addition, we design the Capsule Summarization layer to reduce the complexity by reducing the number of parameters. DL-CapsNet, while being highly accurate, employs a small number of parameters and delivers faster training and inference. DL-CapsNet can process complex datasets with a high number of categories.

Paper Structure

This paper contains 19 sections, 1 equation, 5 figures, 1 table.

Figures (5)

  • Figure 1: The architecture of a CapsCell with $K=3$, $D=4$ and $N_v=32$. This unit contains several ConvCaps layers and a skip-connection. For the 3DR CapsCells, the skip connection performs the 3D dynamic routing operation.
  • Figure 2: 3D-Routing method. Each capsule in layer $l$, predicts $c^{l+1}$ capsules. As a result, there are $c^l$ predictions for a capsule in layer $l+1$. Rajasegaran
  • Figure 3: The capsule summarization layer. A total of $w \times w \times 5$ generated capsules are summarized into $w \times w \times D_{out}$ primary capsules using $w \times w$ Fully-Connected (FC) layers. The first FC layer is shown.
  • Figure 4: DL-CapsNet architecture. The network includes two CapsCells, the MLCE module.
  • Figure 5: The architecture of MLCE module. The module consists of two 3DR CapsCells and two CapsSum layers.