Table of Contents
Fetching ...

Continuously Differentiable Exponential Linear Units

Jonathan T. Barron

TL;DR

This work presents an alternative parametrization of ELU which is C1 continuous for all values of alpha, making the rectifier easier to reason about and making alpha easier to tune.

Abstract

Exponential Linear Units (ELUs) are a useful rectifier for constructing deep learning architectures, as they may speed up and otherwise improve learning by virtue of not have vanishing gradients and by having mean activations near zero. However, the ELU activation as parametrized in [1] is not continuously differentiable with respect to its input when the shape parameter alpha is not equal to 1. We present an alternative parametrization which is C1 continuous for all values of alpha, making the rectifier easier to reason about and making alpha easier to tune. This alternative parametrization has several other useful properties that the original parametrization of ELU does not: 1) its derivative with respect to x is bounded, 2) it contains both the linear transfer function and ReLU as special cases, and 3) it is scale-similar with respect to alpha.

Continuously Differentiable Exponential Linear Units

TL;DR

This work presents an alternative parametrization of ELU which is C1 continuous for all values of alpha, making the rectifier easier to reason about and making alpha easier to tune.

Abstract

Exponential Linear Units (ELUs) are a useful rectifier for constructing deep learning architectures, as they may speed up and otherwise improve learning by virtue of not have vanishing gradients and by having mean activations near zero. However, the ELU activation as parametrized in [1] is not continuously differentiable with respect to its input when the shape parameter alpha is not equal to 1. We present an alternative parametrization which is C1 continuous for all values of alpha, making the rectifier easier to reason about and making alpha easier to tune. This alternative parametrization has several other useful properties that the original parametrization of ELU does not: 1) its derivative with respect to x is bounded, 2) it contains both the linear transfer function and ReLU as special cases, and 3) it is scale-similar with respect to alpha.

Paper Structure

This paper contains 7 equations, 1 figure.

Figures (1)

  • Figure 1: The ELU activation function (top) as described in ELU is not continuously differentiable with respect to $x$ for all value of $\alpha$. Our reparametrization (bottom) gives an activation function with the benefits of ELU, while being continuously differentiable, scale-similar, containing a linear function as a special case, and not having an unbounded derivative.