Squeeze-and-Remember Block
Rinor Cakaj, Jens Mehnert, Bin Yang
TL;DR
The Squeeze-and-Remember (SR) block adds a dynamic memory-like mechanism to CNNs by squeezing the input with a $1 \times 1$ convolution, recalling learned high-level features through an FCN-guided weighting of memory blocks, and adding the resulting memory to the original feature map. This design enables context-aware feature augmentation in non-sequential image tasks, yielding measurable gains on ImageNet (top-1) and Cityscapes (mIoU) with modest parameter and compute overhead. Empirical results across CIFAR, ImageNet, and Cityscapes demonstrate consistent improvements, especially when combined with regularizers like dropout2d and SE/CBAM blocks, while analyses reveal class-dependent memory utilization. The work positions SR as a complementary alternative to recalibration-based attention, expanding CNNs’ capability to remember and reuse learned features for improved inference in diverse visual tasks.
Abstract
Convolutional Neural Networks (CNNs) are important for many machine learning tasks. They are built with different types of layers: convolutional layers that detect features, dropout layers that help to avoid over-reliance on any single neuron, and residual layers that allow the reuse of features. However, CNNs lack a dynamic feature retention mechanism similar to the human brain's memory, limiting their ability to use learned information in new contexts. To bridge this gap, we introduce the "Squeeze-and-Remember" (SR) block, a novel architectural unit that gives CNNs dynamic memory-like functionalities. The SR block selectively memorizes important features during training, and then adaptively re-applies these features during inference. This improves the network's ability to make contextually informed predictions. Empirical results on ImageNet and Cityscapes datasets demonstrate the SR block's efficacy: integration into ResNet50 improved top-1 validation accuracy on ImageNet by 0.52% over dropout2d alone, and its application in DeepLab v3 increased mean Intersection over Union in Cityscapes by 0.20%. These improvements are achieved with minimal computational overhead. This show the SR block's potential to enhance the capabilities of CNNs in image processing tasks.
