Table of Contents
Fetching ...

Training morphological neural networks with gradient descent: some theoretical insights

Samy Blusseau

TL;DR

...

Abstract

Morphological neural networks, or layers, can be a powerful tool to boost the progress in mathematical morphology, either on theoretical aspects such as the representation of complete lattice operators, or in the development of image processing pipelines. However, these architectures turn out to be difficult to train when they count more than a few morphological layers, at least within popular machine learning frameworks which use gradient descent based optimization algorithms. In this paper we investigate the potential and limitations of differentiation based approaches and back-propagation applied to morphological networks, in light of the non-smooth optimization concept of Bouligand derivative. We provide insights and first theoretical guidelines, in particular regarding initialization and learning rates.

Training morphological neural networks with gradient descent: some theoretical insights

TL;DR

...

Abstract

Morphological neural networks, or layers, can be a powerful tool to boost the progress in mathematical morphology, either on theoretical aspects such as the representation of complete lattice operators, or in the development of image processing pipelines. However, these architectures turn out to be difficult to train when they count more than a few morphological layers, at least within popular machine learning frameworks which use gradient descent based optimization algorithms. In this paper we investigate the potential and limitations of differentiation based approaches and back-propagation applied to morphological networks, in light of the non-smooth optimization concept of Bouligand derivative. We provide insights and first theoretical guidelines, in particular regarding initialization and learning rates.
Paper Structure (21 sections, 1 theorem, 31 equations, 1 figure)

This paper contains 21 sections, 1 theorem, 31 equations, 1 figure.

Key Result

Proposition 1

For fixed $W , H\in \mathbb{R}^{m\times n}$ and $\mathbf{x}\in\mathbb{R}^n$, let $\varphi_{\mathbf{x}, i}$, $J_i$ and $K_i$ as defined by eq:phi_x_i, eq:J_wi_x and eq:Ki_1 respectively for $1\leq i\leq m$. Let and $\epsilon = \min_{1\leq i\leq m} \epsilon_i$. Then, for any $\eta\in\mathbb{R}^+$ we have

Figures (1)

  • Figure 1: illustration of the chain rule algorithm.

Theorems & Definitions (2)

  • Proposition 1
  • proof : Proposition \ref{['prop:affine_interval']}