How Much Position Information Do Convolutional Neural Networks Encode?

Md Amirul Islam; Sen Jia; Neil D. B. Bruce

How Much Position Information Do Convolutional Neural Networks Encode?

Md Amirul Islam, Sen Jia, Neil D. B. Bruce

TL;DR

The paper investigates whether CNNs encode absolute position information despite local receptive fields and proposes PosENet, a readout that extracts position maps from frozen encoder features using synthetic ground-truth position maps for supervision. Ground-truth maps are generated to be content-independent, and the model is trained with a pixel-wise loss between predicted and ground-truth maps. Across pretrained backbones and data types, the study finds strong evidence that absolute position information is encoded, with zero-padding at borders identified as a major source and deeper features carrying stronger signals. These findings challenge assumptions about spatial invariance in CNNs and have implications for location-sensitive tasks and for understanding how padding shapes feature representations.

Abstract

In contrast to fully connected networks, Convolutional Neural Networks (CNNs) achieve efficiency by learning weights associated with local filters with a finite spatial extent. An implication of this is that a filter may know what it is looking at, but not where it is positioned in the image. Information concerning absolute position is inherently useful, and it is reasonable to assume that deep CNNs may implicitly learn to encode this information if there is a means to do so. In this paper, we test this hypothesis revealing the surprising degree of absolute position information that is encoded in commonly used neural networks. A comprehensive set of experiments show the validity of this hypothesis and shed light on how and where this information is represented while offering clues to where positional information is derived from in deep CNNs.

How Much Position Information Do Convolutional Neural Networks Encode?

TL;DR

Abstract

How Much Position Information Do Convolutional Neural Networks Encode?

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)