OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning

Xinyu Geng; Jiaming Wang; Jiawei Gong; Yuerong Xue; Jun Xu; Fanglin Chen; Xiaolin Huang

OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning

Xinyu Geng, Jiaming Wang, Jiawei Gong, Yuerong Xue, Jun Xu, Fanglin Chen, Xiaolin Huang

TL;DR

An Orthogonal Capsule Network (OrthCaps) is proposed to reduce redundancy, improve routing performance and decrease parameter counts, and is the first approach to use Householder orthogonal decomposition to enforce orthogonality in CapsNet.

Abstract

Redundancy is a persistent challenge in Capsule Networks (CapsNet),leading to high computational costs and parameter counts. Although previous works have introduced pruning after the initial capsule layer, dynamic routing's fully connected nature and non-orthogonal weight matrices reintroduce redundancy in deeper layers. Besides, dynamic routing requires iterating to converge, further increasing computational demands. In this paper, we propose an Orthogonal Capsule Network (OrthCaps) to reduce redundancy, improve routing performance and decrease parameter counts. Firstly, an efficient pruned capsule layer is introduced to discard redundant capsules. Secondly, dynamic routing is replaced with orthogonal sparse attention routing, eliminating the need for iterations and fully connected structures. Lastly, weight matrices during routing are orthogonalized to sustain low capsule similarity, which is the first approach to introduce orthogonality into CapsNet as far as we know. Our experiments on baseline datasets affirm the efficiency and robustness of OrthCaps in classification tasks, in which ablation studies validate the criticality of each component. Remarkably, OrthCaps-Shallow outperforms other Capsule Network benchmarks on four datasets, utilizing only 110k parameters, which is a mere 1.25% of a standard Capsule Network's total. To the best of our knowledge, it achieves the smallest parameter count among existing Capsule Networks. Similarly, OrthCaps-Deep demonstrates competitive performance across four datasets, utilizing only 1.2% of the parameters required by its counterparts.

OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning

TL;DR

Abstract

Paper Structure (28 sections, 13 equations, 8 figures, 6 tables, 2 algorithms)

This paper contains 28 sections, 13 equations, 8 figures, 6 tables, 2 algorithms.

Introduction
Related Work
Methodology
Overall Architecture
Pruned Capsule Layer
Routing Algorithm
Orthogonalization
Orthogonalization of Weight Matrices
Householder Orthogonalization
Experiments
Experimental Setup
Classification Performance Comparison
Ablation Study
Orthogonal Attention Routing
Pruned Capsule Layer
...and 13 more sections

Figures (8)

Figure 1: Dynamic routing mechanism. $u_i,v_j$ are the lower-level capsule, and higher-level capsule, respectively. $W$ is the weight matrix to produce the pose prediction $\hat{u}_i$ of $u_i$ for next level. $b_{ij}$ is a temporary variable to calculate the coupling coefficient $c_{ij}$.
Figure 2: Left: In the primary capsule layer of CapsNet, 48.2% of capsule pairs have cosine similarities greater than 0.65, indicating significant redundancy among capsules. Right: After introducing the Pruned Layer, capsule similarities effectively decrease. (Detailed in Section 3.2)
Figure 3: (a): In CIFAR10 classification task, the OrthCaps-D model comprises 7 capsule blocks, each with 3 capsule layers, interconnected via shortcut connections and orthogonal sparse attention routing. (b): The OrthCaps-S model contains two capsule layers coping with CIFAR10 and does not use any capsule layer with MNIST. These layers are linked through simplified attention routing.
Figure 4: Orthogonal self-attention routing.
Figure 5: The computing process of HouseHolder method.
...and 3 more figures

OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning

TL;DR

Abstract

OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning

Authors

TL;DR

Abstract

Table of Contents

Figures (8)