On the Shift Invariance of Max Pooling Feature Maps in Convolutional Neural Networks
Hubert Leterme, Kévin Polisano, Valérie Perrier, Karteek Alahari
TL;DR
This work analyzes how max pooling in CNNs interacts with first-layer Gabor-like filters to influence shift invariance. By introducing a complex modulus operator $U^{\mathrm{mod}}$ and a real max-pooling operator $U^{\mathrm{max}}$, the authors develop a probabilistic framework and derive bounds that quantify when $U^{\mathrm{max}}$ approximates $U^{\mathrm{mod}}$ under bandwidth and grid-resolution constraints. The theory is extended to multichannel convolutions and validated through a deterministic DT-$\mathbb{C}$WPT case study, showing that modulus-based representations offer near-translation invariance and can serve as a stable proxy for real max pooling in practice. The results justify a domain where $\mathbb{R}$Max and $\mathbb{C}$Mod outputs align closely, informing architecture design to preserve high-frequency information while achieving translation stability in early CNN layers.
Abstract
This paper focuses on improving the mathematical interpretability of convolutional neural networks (CNNs) in the context of image classification. Specifically, we tackle the instability issue arising in their first layer, which tends to learn parameters that closely resemble oriented band-pass filters when trained on datasets like ImageNet. Subsampled convolutions with such Gabor-like filters are prone to aliasing, causing sensitivity to small input shifts. In this context, we establish conditions under which the max pooling operator approximates a complex modulus, which is nearly shift invariant. We then derive a measure of shift invariance for subsampled convolutions followed by max pooling. In particular, we highlight the crucial role played by the filter's frequency and orientation in achieving stability. We experimentally validate our theory by considering a deterministic feature extractor based on the dual-tree complex wavelet packet transform, a particular case of discrete Gabor-like decomposition.
