Table of Contents
Fetching ...

Leaky ReLUs That Differ in Forward and Backward Pass Facilitate Activation Maximization in Deep Neural Networks

Christoph Linse, Erhardt Barth, Thomas Martinetz

TL;DR

The resulting ProxyGrad algorithm implements a novel optimization technique for neural networks that employs a secondary network as a proxy for gradient computation, designed to have a simpler loss landscape with fewer local maxima than the original network.

Abstract

Activation maximization (AM) strives to generate optimal input stimuli, revealing features that trigger high responses in trained deep neural networks. AM is an important method of explainable AI. We demonstrate that AM fails to produce optimal input stimuli for simple functions containing ReLUs or Leaky ReLUs, casting doubt on the practical usefulness of AM and the visual interpretation of the generated images. This paper proposes a solution based on using Leaky ReLUs with a high negative slope in the backward pass while keeping the original, usually zero, slope in the forward pass. The approach significantly increases the maxima found by AM. The resulting ProxyGrad algorithm implements a novel optimization technique for neural networks that employs a secondary network as a proxy for gradient computation. This proxy network is designed to have a simpler loss landscape with fewer local maxima than the original network. Our chosen proxy network is an identical copy of the original network, including its weights, with distinct negative slopes in the Leaky ReLUs. Moreover, we show that ProxyGrad can be used to train the weights of Convolutional Neural Networks for classification such that, on some of the tested benchmarks, they outperform traditional networks.

Leaky ReLUs That Differ in Forward and Backward Pass Facilitate Activation Maximization in Deep Neural Networks

TL;DR

The resulting ProxyGrad algorithm implements a novel optimization technique for neural networks that employs a secondary network as a proxy for gradient computation, designed to have a simpler loss landscape with fewer local maxima than the original network.

Abstract

Activation maximization (AM) strives to generate optimal input stimuli, revealing features that trigger high responses in trained deep neural networks. AM is an important method of explainable AI. We demonstrate that AM fails to produce optimal input stimuli for simple functions containing ReLUs or Leaky ReLUs, casting doubt on the practical usefulness of AM and the visual interpretation of the generated images. This paper proposes a solution based on using Leaky ReLUs with a high negative slope in the backward pass while keeping the original, usually zero, slope in the forward pass. The approach significantly increases the maxima found by AM. The resulting ProxyGrad algorithm implements a novel optimization technique for neural networks that employs a secondary network as a proxy for gradient computation. This proxy network is designed to have a simpler loss landscape with fewer local maxima than the original network. Our chosen proxy network is an identical copy of the original network, including its weights, with distinct negative slopes in the Leaky ReLUs. Moreover, we show that ProxyGrad can be used to train the weights of Convolutional Neural Networks for classification such that, on some of the tested benchmarks, they outperform traditional networks.

Paper Structure

This paper contains 17 sections, 8 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Illustration of function $f_3(\mathbf{x}) = \sum \text{LReLU}_s(\mathbf{x}) + \sum \text{LReLU}_s(-p\mathbf{x})$ with $p=0.2$ and $s=0.1$ for an input image with a single pixel.
  • Figure 2: Mean activation of the generated images as measured by the original ReLU network. Top: after 10 iterations. Bottom: after 500 iterations.
  • Figure 3: AM with a high negative slope of 0.3, far beyond the optimal range. In this negative example, the class 'poncho' is visualized. Best viewed digitally with zoom. Left: Leaky ReLU network. Right: ReLU with ProxyGrad
  • Figure 4: Standard deviation of the input to batch normalization layers taken from different locations in ResNet with Leaky ReLUs.