The Convergence of Dynamic Routing between Capsules
Daoyuan Ye, Juntao Li, Yiting Shen
TL;DR
This work analyzes capsule networks with dynamic routing by casting routing as optimization of a concave energy $E(C)$ under linear constraints. It shows the routing procedure is a nonlinear gradient descent on a matrix-derivative formulation, and provides convergence proofs (energy decrease and constrained optimality). Two experiments visualize convergence behavior and polarization of capsule activations, revealing that increasing routing iterations can polarize or mute several capsules. The findings expose polarization as a major limitation and motivate developing routing algorithms with stronger theoretical guarantees and stability for practical use.
Abstract
Capsule networks(CapsNet) are recently proposed neural network models with new processing layers, specifically for entity representation and discovery of images. It is well known that CapsNet have some advantages over traditional neural networks, especially in generalization capability. At the same time, some studies report negative experimental results. The causes of this contradiction have not been thoroughly analyzed. The preliminary experimental results show that the behavior of routing algorithms does not always produce good results as expected, and in most cases, different routing algorithms do not change the classification results, but simply polarize the link strength, especially when they continue to repeat without stopping. To realize the true potential of the CapsNet, deep mathematical analysis of the routing algorithms is crucial. In this paper, we will give the objective function that is minimized by the dynamic routing algorithm, which is a concave function. The dynamic routing algorithm can be regarded as nonlinear gradient method to solving an optimization algorithm under linear constraints, and its convergence can be strictly proved mathematically. Furthermore, the mathematically rigorous proof of the convergence is given for this class of iterative routing procedures. We analyze the relation between the objective function and the constraints solved by the dynamic routing algorithm in detail, and perform the corresponding routing experiment to analyze the effect of our convergence proof.
