Beyond Skip Connection: Pooling and Unpooling Design for Elimination Singularities
Chengkun Sun, Jinqian Pan, Zhuoli Jin, Russell Stevens Terry, Jiang Bian, Jie Xu
TL;DR
Pool Skip addresses elimination singularities that hamper training of deep CNNs by integrating a pooling-unpooling path with a small convolution and skip connections, underpinned by the Weight Inertia hypothesis and compensation theory. The approach stabilizes gradient flow and enables updates to previously inert weights, yielding performance gains across 2D classification, 2D segmentation, and 3D medical image segmentation tasks. Empirical results on CIFAR-10/100, Cityscapes, Pascal VOC, BTCV, and AMOS show consistent improvements across CNNs and, to a lesser extent, ViT variants, validating the practicality of pooling-based compensation in deep architectures. The work provides a theoretical framework and a lightweight architectural module that improves training robustness with potential broad applicability to other deep models.
Abstract
Training deep Convolutional Neural Networks (CNNs) presents unique challenges, including the pervasive issue of elimination singularities, consistent deactivation of nodes leading to degenerate manifolds within the loss landscape. These singularities impede efficient learning by disrupting feature propagation. To mitigate this, we introduce Pool Skip, an architectural enhancement that strategically combines a Max Pooling, a Max Unpooling, a 3 times 3 convolution, and a skip connection. This configuration helps stabilize the training process and maintain feature integrity across layers. We also propose the Weight Inertia hypothesis, which underpins the development of Pool Skip, providing theoretical insights into mitigating degradation caused by elimination singularities through dimensional and affine compensation. We evaluate our method on a variety of benchmarks, focusing on both 2D natural and 3D medical imaging applications, including tasks such as classification and segmentation. Our findings highlight Pool Skip's effectiveness in facilitating more robust CNN training and improving model performance.
