Activation Bottleneck: Sigmoidal Neural Networks Cannot Forecast a Straight Line

Maximilian Toller; Hussain Hussain; Bernhard C Geiger

Activation Bottleneck: Sigmoidal Neural Networks Cannot Forecast a Straight Line

Maximilian Toller, Hussain Hussain, Bernhard C Geiger

TL;DR

The paper investigates activation bottlenecks, defined as hidden layers with bounded image, and shows that sigmoidal networks, including typical LSTM/GRU configurations, cannot forecast unbounded sequences such as straight lines or trends. It formalizes the concept with definitions and the maximum approximation error $\varepsilon^\star_{f,g}$, and proves that a bottleneck followed by Lipschitz layers yields a bounded $g$; for surjective targets with unbounded domains this leads to $\varepsilon^\star_{f,g}=\\infty$. Empirically, networks with bottlenecks fail to fit simple unbounded sequences while architectures without bottlenecks succeed. The paper also proposes mitigation strategies (skip-connections, switching to ReLU/linear activations, or adding non-Lipschitz layers) and provides practical guidance on handling unbounded data in real-world tasks.

Abstract

A neural network has an activation bottleneck if one of its hidden layers has a bounded image. We show that networks with an activation bottleneck cannot forecast unbounded sequences such as straight lines, random walks, or any sequence with a trend: The difference between prediction and ground truth becomes arbitrary large, regardless of the training procedure. Widely-used neural network architectures such as LSTM and GRU suffer from this limitation. In our analysis, we characterize activation bottlenecks and explain why they prevent sigmoidal networks from learning unbounded sequences. We experimentally validate our findings and discuss modifications to network architectures which mitigate the effects of activation bottlenecks.

Activation Bottleneck: Sigmoidal Neural Networks Cannot Forecast a Straight Line

TL;DR

, and proves that a bottleneck followed by Lipschitz layers yields a bounded

; for surjective targets with unbounded domains this leads to

. Empirically, networks with bottlenecks fail to fit simple unbounded sequences while architectures without bottlenecks succeed. The paper also proposes mitigation strategies (skip-connections, switching to ReLU/linear activations, or adding non-Lipschitz layers) and provides practical guidance on handling unbounded data in real-world tasks.

Abstract

Paper Structure (12 sections, 2 theorems, 4 equations, 1 figure)

This paper contains 12 sections, 2 theorems, 4 equations, 1 figure.

Introduction
Theory
Definitions
Results
Experiments
Discussion
Link to theoretical results.
How to fix activation bottlenecks.
Practical advise.
Proofs
Proof of Lemma \ref{['lemma:ab']}
Proof of Theorem \ref{['thm:infinity_error']}

Key Result

lemma 1

If neural network $g$ has an activation bottleneck in hidden layer $h_i$ and the layers after $h_i$ are Lipschitz continuous, then the image of $g$ is bounded.

Figures (1)

Figure 1: Results of the experiment. Left) Models with activation bottleneck struggle to fit/forecast a simple straight line. Right) Models without activation bottleneck do not have this limitation and easily solve this learning task.

Theorems & Definitions (6)

definition 1: Maximum Approximation Error
definition 2: Activation Bottleneck
lemma 1
theorem 1
proof
proof

Activation Bottleneck: Sigmoidal Neural Networks Cannot Forecast a Straight Line

TL;DR

Abstract

Activation Bottleneck: Sigmoidal Neural Networks Cannot Forecast a Straight Line

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (6)