Training morphological neural networks with gradient descent: some theoretical insights

Samy Blusseau

Training morphological neural networks with gradient descent: some theoretical insights

Samy Blusseau

TL;DR

...

Abstract

Morphological neural networks, or layers, can be a powerful tool to boost the progress in mathematical morphology, either on theoretical aspects such as the representation of complete lattice operators, or in the development of image processing pipelines. However, these architectures turn out to be difficult to train when they count more than a few morphological layers, at least within popular machine learning frameworks which use gradient descent based optimization algorithms. In this paper we investigate the potential and limitations of differentiation based approaches and back-propagation applied to morphological networks, in light of the non-smooth optimization concept of Bouligand derivative. We provide insights and first theoretical guidelines, in particular regarding initialization and learning rates.

Training morphological neural networks with gradient descent: some theoretical insights

TL;DR

...

Abstract

Paper Structure (21 sections, 1 theorem, 31 equations, 1 figure)

This paper contains 21 sections, 1 theorem, 31 equations, 1 figure.

Introduction
Morphological networks
Optimization with gradient descent
Gradient descent
Back propagation and the chain rule
The Bouligand derivative
Optimization with the Bouligand derivative
Derivatives of the morphological layers
Derivative with respect to $W$
Derivative with respect to $\mathbf{x}$
Updating the parameters
Problem setting.
Proposition of candidates $\Delta W$.
Choosing the learning rate.
Message passing
...and 6 more sections

Key Result

Proposition 1

For fixed $W , H\in \mathbb{R}^{m\times n}$ and $\mathbf{x}\in\mathbb{R}^n$, let $\varphi_{\mathbf{x}, i}$, $J_i$ and $K_i$ as defined by eq:phi_x_i, eq:J_wi_x and eq:Ki_1 respectively for $1\leq i\leq m$. Let and $\epsilon = \min_{1\leq i\leq m} \epsilon_i$. Then, for any $\eta\in\mathbb{R}^+$ we have

Figures (1)

Figure 1: illustration of the chain rule algorithm.

Theorems & Definitions (2)

Proposition 1
proof : Proposition \ref{['prop:affine_interval']}

Training morphological neural networks with gradient descent: some theoretical insights

TL;DR

Abstract

Training morphological neural networks with gradient descent: some theoretical insights

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (2)