Efficient Representation of Natural Image Patches
Cheng Guo
TL;DR
The paper proposes an abstract, four-assumption IPU framework to study early visual processing under two objectives: maximize information transmission and model the input distribution. It shows these goals are generally distinct, deriving $D_{KL}(p\|q)=H_q-H_p$ and showing that maximizing $H_Q$ does not necessarily minimize $H_q$, then advocates an even-coding compromise. The authors demonstrate two concrete loss formulations for discrete inputs (single and multiple output dimensions) and extend to image patches by using real-valued outputs with a repulsive loss that yields near-binary, evenly distributed representations while preserving luminance, color, and edge information. Compared to deep learning models, the unsupervised even-coding IPU for image patches achieves comparable perceptual structure with dramatically higher efficiency and without preprocessing, offering insights into biological coding and potential improvements for DL efficiency.
Abstract
Utilizing an abstract information processing model based on minimal yet realistic assumptions inspired by biological systems, we study how to achieve the early visual system's two ultimate objectives: efficient information transmission and accurate sensor probability distribution modeling. We prove that optimizing for information transmission does not guarantee optimal probability distribution modeling in general. We illustrate, using a two-pixel (2D) system and image patches, that an efficient representation can be realized through a nonlinear population code driven by two types of biologically plausible loss functions that depend solely on output. After unsupervised learning, our abstract information processing model bears remarkable resemblances to biological systems, despite not mimicking many features of real neurons, such as spiking activity. A preliminary comparison with a contemporary deep learning model suggests that our model offers a significant efficiency advantage. Our model provides novel insights into the computational theory of early visual systems as well as a potential new approach to enhance the efficiency of deep learning models.
