Deep Generalized Max Pooling
Vincent Christlein, Lukas Spranger, Mathias Seuret, Anguelos Nicolaou, Pavel Král, Andreas Maier
TL;DR
This work tackles bias in global pooling methods that treat each activation map independently by introducing Deep Generalized Max Pooling (DGMP), a differentiable layer that balances activations across spatial locations via weight optimization on local descriptors along the depth dimension. DGMP reinterprets GMP as a neural-network layer, solving a ridge-regression objective to compute weights and produce a unit-norm global descriptor, with a single learnable parameter λ guiding pooling. Empirically, DGMP outperforms global average and max pooling on writer identification (ICDAR17-WI) and script type classification (CLamm16/CLamm17), while remaining lightweight and end-to-end trainable. The approach yields stronger, more robust representations for structured historical documents and offers potential for broader applications such as word spotting, with code publicly available.
Abstract
Global pooling layers are an essential part of Convolutional Neural Networks (CNN). They are used to aggregate activations of spatial locations to produce a fixed-size vector in several state-of-the-art CNNs. Global average pooling or global max pooling are commonly used for converting convolutional features of variable size images to a fix-sized embedding. However, both pooling layer types are computed spatially independent: each individual activation map is pooled and thus activations of different locations are pooled together. In contrast, we propose Deep Generalized Max Pooling that balances the contribution of all activations of a spatially coherent region by re-weighting all descriptors so that the impact of frequent and rare ones is equalized. We show that this layer is superior to both average and max pooling on the classification of Latin medieval manuscripts (CLAMM'16, CLAMM'17), as well as writer identification (Historical-WI'17).
