Table of Contents
Fetching ...

The Convergence of Dynamic Routing between Capsules

Daoyuan Ye, Juntao Li, Yiting Shen

TL;DR

This work analyzes capsule networks with dynamic routing by casting routing as optimization of a concave energy $E(C)$ under linear constraints. It shows the routing procedure is a nonlinear gradient descent on a matrix-derivative formulation, and provides convergence proofs (energy decrease and constrained optimality). Two experiments visualize convergence behavior and polarization of capsule activations, revealing that increasing routing iterations can polarize or mute several capsules. The findings expose polarization as a major limitation and motivate developing routing algorithms with stronger theoretical guarantees and stability for practical use.

Abstract

Capsule networks(CapsNet) are recently proposed neural network models with new processing layers, specifically for entity representation and discovery of images. It is well known that CapsNet have some advantages over traditional neural networks, especially in generalization capability. At the same time, some studies report negative experimental results. The causes of this contradiction have not been thoroughly analyzed. The preliminary experimental results show that the behavior of routing algorithms does not always produce good results as expected, and in most cases, different routing algorithms do not change the classification results, but simply polarize the link strength, especially when they continue to repeat without stopping. To realize the true potential of the CapsNet, deep mathematical analysis of the routing algorithms is crucial. In this paper, we will give the objective function that is minimized by the dynamic routing algorithm, which is a concave function. The dynamic routing algorithm can be regarded as nonlinear gradient method to solving an optimization algorithm under linear constraints, and its convergence can be strictly proved mathematically. Furthermore, the mathematically rigorous proof of the convergence is given for this class of iterative routing procedures. We analyze the relation between the objective function and the constraints solved by the dynamic routing algorithm in detail, and perform the corresponding routing experiment to analyze the effect of our convergence proof.

The Convergence of Dynamic Routing between Capsules

TL;DR

This work analyzes capsule networks with dynamic routing by casting routing as optimization of a concave energy under linear constraints. It shows the routing procedure is a nonlinear gradient descent on a matrix-derivative formulation, and provides convergence proofs (energy decrease and constrained optimality). Two experiments visualize convergence behavior and polarization of capsule activations, revealing that increasing routing iterations can polarize or mute several capsules. The findings expose polarization as a major limitation and motivate developing routing algorithms with stronger theoretical guarantees and stability for practical use.

Abstract

Capsule networks(CapsNet) are recently proposed neural network models with new processing layers, specifically for entity representation and discovery of images. It is well known that CapsNet have some advantages over traditional neural networks, especially in generalization capability. At the same time, some studies report negative experimental results. The causes of this contradiction have not been thoroughly analyzed. The preliminary experimental results show that the behavior of routing algorithms does not always produce good results as expected, and in most cases, different routing algorithms do not change the classification results, but simply polarize the link strength, especially when they continue to repeat without stopping. To realize the true potential of the CapsNet, deep mathematical analysis of the routing algorithms is crucial. In this paper, we will give the objective function that is minimized by the dynamic routing algorithm, which is a concave function. The dynamic routing algorithm can be regarded as nonlinear gradient method to solving an optimization algorithm under linear constraints, and its convergence can be strictly proved mathematically. Furthermore, the mathematically rigorous proof of the convergence is given for this class of iterative routing procedures. We analyze the relation between the objective function and the constraints solved by the dynamic routing algorithm in detail, and perform the corresponding routing experiment to analyze the effect of our convergence proof.
Paper Structure (14 sections, 27 equations, 2 figures, 2 algorithms)

This paper contains 14 sections, 27 equations, 2 figures, 2 algorithms.

Figures (2)

  • Figure 1: The values of objective function $\Psi$ (Equation \ref{['Psidef']}) for all capsules and each capsule during routing process
  • Figure 2: The distribution map on the input of $l + 1$ layer. Sparse points represent the prediction values of different capsules. The asterisks represent the final output of dynamic algorithm for each capsule.