Table of Contents
Fetching ...

Efficient Representation of Natural Image Patches

Cheng Guo

TL;DR

The paper proposes an abstract, four-assumption IPU framework to study early visual processing under two objectives: maximize information transmission and model the input distribution. It shows these goals are generally distinct, deriving $D_{KL}(p\|q)=H_q-H_p$ and showing that maximizing $H_Q$ does not necessarily minimize $H_q$, then advocates an even-coding compromise. The authors demonstrate two concrete loss formulations for discrete inputs (single and multiple output dimensions) and extend to image patches by using real-valued outputs with a repulsive loss that yields near-binary, evenly distributed representations while preserving luminance, color, and edge information. Compared to deep learning models, the unsupervised even-coding IPU for image patches achieves comparable perceptual structure with dramatically higher efficiency and without preprocessing, offering insights into biological coding and potential improvements for DL efficiency.

Abstract

Utilizing an abstract information processing model based on minimal yet realistic assumptions inspired by biological systems, we study how to achieve the early visual system's two ultimate objectives: efficient information transmission and accurate sensor probability distribution modeling. We prove that optimizing for information transmission does not guarantee optimal probability distribution modeling in general. We illustrate, using a two-pixel (2D) system and image patches, that an efficient representation can be realized through a nonlinear population code driven by two types of biologically plausible loss functions that depend solely on output. After unsupervised learning, our abstract information processing model bears remarkable resemblances to biological systems, despite not mimicking many features of real neurons, such as spiking activity. A preliminary comparison with a contemporary deep learning model suggests that our model offers a significant efficiency advantage. Our model provides novel insights into the computational theory of early visual systems as well as a potential new approach to enhance the efficiency of deep learning models.

Efficient Representation of Natural Image Patches

TL;DR

The paper proposes an abstract, four-assumption IPU framework to study early visual processing under two objectives: maximize information transmission and model the input distribution. It shows these goals are generally distinct, deriving and showing that maximizing does not necessarily minimize , then advocates an even-coding compromise. The authors demonstrate two concrete loss formulations for discrete inputs (single and multiple output dimensions) and extend to image patches by using real-valued outputs with a repulsive loss that yields near-binary, evenly distributed representations while preserving luminance, color, and edge information. Compared to deep learning models, the unsupervised even-coding IPU for image patches achieves comparable perceptual structure with dramatically higher efficiency and without preprocessing, offering insights into biological coding and potential improvements for DL efficiency.

Abstract

Utilizing an abstract information processing model based on minimal yet realistic assumptions inspired by biological systems, we study how to achieve the early visual system's two ultimate objectives: efficient information transmission and accurate sensor probability distribution modeling. We prove that optimizing for information transmission does not guarantee optimal probability distribution modeling in general. We illustrate, using a two-pixel (2D) system and image patches, that an efficient representation can be realized through a nonlinear population code driven by two types of biologically plausible loss functions that depend solely on output. After unsupervised learning, our abstract information processing model bears remarkable resemblances to biological systems, despite not mimicking many features of real neurons, such as spiking activity. A preliminary comparison with a contemporary deep learning model suggests that our model offers a significant efficiency advantage. Our model provides novel insights into the computational theory of early visual systems as well as a potential new approach to enhance the efficiency of deep learning models.
Paper Structure (29 sections, 35 equations, 14 figures)

This paper contains 29 sections, 35 equations, 14 figures.

Figures (14)

  • Figure 1: Example illustrating how an IPU models input probability $p(x)$: (a) This panel shows the example input probability distribution $p(x)$ alongside the approximation learned by the IPU model, $q(x)$, calculated using Eq. (\ref{['eq:qx2']}). Here, $x$ is discrete and has $M$ different states. (b) The many-to-one function $y = f(x)$ categorizes the input states into $N=7$ distinct groups within the output space. (c) With both $p(x)$ and $f(x)$ given, one can calculate the output distribution $Q(y)$ over the $N=7$ output states using Eq. (\ref{['eq:Qy']}).
  • Figure 2: Evenly partitioning the two-pixel probability distribution learned by multilayer perceptrons (MLPs). The X and Y axes represent the intensities $x_a$ and $x_b$ of the two pixels. The quantity $n(x_a, x_b) + 1$ is plotted in gray on a log scale, where $n(x_a, x_b)$ denotes the number of occurrences of the two-pixel values among the sampled data. Color lines indicate the boundaries of states for each output dimension learned by an MLP, with one color representing one dimension. (a) One output dimension with 16 states, which partitions the two pixel intensity space based on the total intensity $x_a + x_b$. (b) One output dimension with 200 states partitions an artificial two pixel intensity which is a 2D normal distribution with a learned hexagonal lattice. (c) Two independent output dimensions, each with 10 states, dividing the two pixel intensity space based on the total intensity $x_a + x_b$ and the contrast $x_a - x_b$ approximately.
  • Figure 3: Statistical analysis of the learned representation using node-wise loss function with $\alpha = 0.625$. (a) Histogram of the model's output values on a log scale. The vast majority of the output values are either at 0 or 1, signifying that our model encoded the images using binary representation. (b) Probability of an output node being activated by a random image patch.
  • Figure 4: Image patches with the shortest distance in the representation space to 16 randomly selected image patches. The first column displays the 16 random image patches, while the succeeding nine columns display patches that are closest to the first-column patches in the same row. (a) Distances are computed using an even coding IPU model, trained through unsupervised learning, with 96 binary number outputs. (b) Distances are computed using the first 10 layers of a convolutional neural network (VGG16) model, pretrained through supervised learning, with 128 floating-point number outputs.
  • Figure 5: Feature maps of nodes resembling local edge detectors. The first column presents the grayscale test images. Each subsequent column, except the last one, displays the feature maps corresponding to the same output node for the test images. The last column shows edges generated by the multi-stage Canny edge detector for comparison.
  • ...and 9 more figures