Stochastic Subsampling With Average Pooling

Bum Jun Kim; Sang Woo Kim

Stochastic Subsampling With Average Pooling

Bum Jun Kim, Sang Woo Kim

TL;DR

The paper tackles overfitting and regularization in deep networks by addressing the instability Dropout introduces when used with batch normalization. It proposes stochastic average pooling (SAP), a module that combines stochastic subsampling with average pooling and applies a $\sqrt{p}$ scaling so that training-time subnetworks align with a test-time ensemble, while keeping the output size fixed. SAP has 1D and 2D formulations and can be dropped into existing architectures as a direct GAP replacement. Empirical results across image classification, semantic segmentation, and object detection demonstrate consistent improvements with SAP, particularly at moderate keep probabilities (e.g., $p\approx0.5$), indicating broad applicability and practical impact. The work also analyzes subsampling patterns, finding that channel-shared randomness without strong spatial constraints offers the most robust regularization benefits.

Abstract

Regularization of deep neural networks has been an important issue to achieve higher generalization performance without overfitting problems. Although the popular method of Dropout provides a regularization effect, it causes inconsistent properties in the output, which may degrade the performance of deep neural networks. In this study, we propose a new module called stochastic average pooling, which incorporates Dropout-like stochasticity in pooling. We describe the properties of stochastic subsampling and average pooling and leverage them to design a module without any inconsistency problem. The stochastic average pooling achieves a regularization effect without any potential performance degradation due to the inconsistency issue and can easily be plugged into existing architectures of deep neural networks. Experiments demonstrate that replacing existing average pooling with stochastic average pooling yields consistent improvements across a variety of tasks, datasets, and models.

Stochastic Subsampling With Average Pooling

TL;DR

scaling so that training-time subnetworks align with a test-time ensemble, while keeping the output size fixed. SAP has 1D and 2D formulations and can be dropped into existing architectures as a direct GAP replacement. Empirical results across image classification, semantic segmentation, and object detection demonstrate consistent improvements with SAP, particularly at moderate keep probabilities (e.g.,

), indicating broad applicability and practical impact. The work also analyzes subsampling patterns, finding that channel-shared randomness without strong spatial constraints offers the most robust regularization benefits.

Abstract

Paper Structure (12 sections, 7 equations, 8 figures, 9 tables)

This paper contains 12 sections, 7 equations, 8 figures, 9 tables.

Introduction
Method
Preliminaries: Dropout and PatchDropout
Proposed Method: Stochastic Average Pooling
Empirical Observation
Experiments
Replace GAP in Classifier Head
Replace GAP in SE Block
Replace Average Pooling in Semantic Segmentation Networks
Replace GAP in Object Detection Networks
Discussion
Conclusion

Figures (8)

Figure 1: Illustration of Dropout and stochastic subsampling for a keep probability of $p=0.5$. Dropout erases half of the elements into zeros and scales the vector by $1/p$, causing increased variance. Stochastic subsampling yields a subvector of the input, which conserves variance. Nevertheless, stochastic subsampling reduces the size of the vector.
Figure 2: Illustration of stochastic average pooling during training and test phases for a 1D vector
Figure 3: Illustration of stochastic average pooling for 2D image feature. We flatten the spatial dimensions and apply stochastic average pooling to each channel.
Figure 4: During the training phase, stochastic average pooling behaves as average pooling on a subnetwork that is randomly sampled. During the test phase, it operates as vanilla average pooling that becomes an ensemble of all possible subnetworks.
Figure 5: Simulation results for the second moment after stochastic average pooling. To obtain a consistent second moment during training and test phases, $\sqrt{p}$ scaling should be applied.
...and 3 more figures

Stochastic Subsampling With Average Pooling

TL;DR

Abstract

Stochastic Subsampling With Average Pooling

Authors

TL;DR

Abstract

Table of Contents

Figures (8)