Table of Contents
Fetching ...

The Cooperative Network Architecture: Learning Structured Networks as Representation of Sensory Patterns

Pascal J. Sager, Jan M. Deriu, Benjamin F. Grewe, Thilo Stadelmann, Christoph von der Malsburg

TL;DR

The Cooperative Network Architecture (CNA) addresses robust object representation by representing sensory inputs as coherent nets $ \mathcal{N}^*$ assembled from overlapping net fragments $\mathcal{F}$ learned through input statistics and Hebbian plasticity. It implements a two-stage, time-evolving system: a fixed-feature Stage 1 and a dynamic Stage 2 that learns and composes net fragments with competitive neurons and global attenuation, enabling figure completion and noise filtering. Empirical results on simple line patterns show compositional generalization to unseen structures and strong noise robustness (up to 59% Gaussian noise) with favorable comparison to autoencoders in out-of-distribution scenarios. This work presents a biologically inspired neural-coding framework that couples local feature processing with global, coherent structure formation, offering a potential path toward invariant object recognition and scalable, network-based perception, with future extensions to multi-area architectures.

Abstract

We introduce the Cooperative Network Architecture (CNA), a model that represents sensory signals using structured, recurrently connected networks of neurons, termed "nets." Nets are dynamically assembled from overlapping net fragments, which are learned based on statistical regularities in sensory input. This architecture offers robustness to noise, deformation, and generalization to out-of-distribution data, addressing challenges in current vision systems from a novel perspective. We demonstrate that net fragments can be learned without supervision and flexibly recombined to encode novel patterns, enabling figure completion and resilience to noise. Our findings establish CNA as a promising paradigm for developing neural representations that integrate local feature processing with global structure formation, providing a foundation for future research on invariant object recognition.

The Cooperative Network Architecture: Learning Structured Networks as Representation of Sensory Patterns

TL;DR

The Cooperative Network Architecture (CNA) addresses robust object representation by representing sensory inputs as coherent nets assembled from overlapping net fragments learned through input statistics and Hebbian plasticity. It implements a two-stage, time-evolving system: a fixed-feature Stage 1 and a dynamic Stage 2 that learns and composes net fragments with competitive neurons and global attenuation, enabling figure completion and noise filtering. Empirical results on simple line patterns show compositional generalization to unseen structures and strong noise robustness (up to 59% Gaussian noise) with favorable comparison to autoencoders in out-of-distribution scenarios. This work presents a biologically inspired neural-coding framework that couples local feature processing with global, coherent structure formation, offering a potential path toward invariant object recognition and scalable, network-based perception, with future extensions to multi-area architectures.

Abstract

We introduce the Cooperative Network Architecture (CNA), a model that represents sensory signals using structured, recurrently connected networks of neurons, termed "nets." Nets are dynamically assembled from overlapping net fragments, which are learned based on statistical regularities in sensory input. This architecture offers robustness to noise, deformation, and generalization to out-of-distribution data, addressing challenges in current vision systems from a novel perspective. We demonstrate that net fragments can be learned without supervision and flexibly recombined to encode novel patterns, enabling figure completion and resilience to noise. Our findings establish CNA as a promising paradigm for developing neural representations that integrate local feature processing with global structure formation, providing a foundation for future research on invariant object recognition.
Paper Structure (41 sections, 33 equations, 11 figures)

This paper contains 41 sections, 33 equations, 11 figures.

Figures (11)

  • Figure 1: Representation of objects by nets. Objects are captured in their entirety by coherent nets composed of overlapping net fragments (schematically symbolized by white circles in the right panel). After training on natural images, each patch of visual space contains a complement of net fragments that represent textures that have been encountered with statistical significance. Net fragments overlap neuron-wise, and those activated by an object coalesce into a coherent net. The conundrum (stated in Olshausen2005) that object contours can often not be found with the help of edge detectors due to lack of gray-level contrast (as in places inside the square in the left panel) may be resolved by the idea that contours are defined as borders of the nets covering objects (or covering the background). Figure adapted from Olshausen2005, with permission.
  • Figure 2: Dynamics within the proposed CNA: The input image is fed into stage $S1$ to obtain feature activations. The feature activations are then processed by stage $S2$, together with activations from neurons within the same layer.
  • Figure 3: The response of our model to a diagonal line input is displayed as a net of activated neurons. The left of the image shows the neuronal activation $\boldsymbol{y}^{(S2)}$ when observing the line, and the right visualizes the net formed by these neurons (activations $\boldsymbol{y}^{(S2)}$ with their connections $\boldsymbol{W}^{(L)}$). For convenience, the neurons are rearranged in a circle, and the net fragment (connections) of two randomly selected neurons, $n_i$ and $n_j$, are visualized in colors (purple and red).
  • Figure 4: Training and evaluation data. The top of the figure illustrates the straight lines used during the training phase. Below, various samples from the evaluation datasets are shown, including kinked lines (rows $1$ and $2$), digits and characters (rows $3$ and $4$), and line drawings (rows $5$--$8$). For each row, two input examples, denoted as $\boldsymbol{x}_1$ and $\boldsymbol{x}_2$, are displayed alongside their respective feature activations, $\boldsymbol{y}^{(S1)}$, and neuronal activations, $\boldsymbol{y}^{(S2)}$, shown both without and with addition of Gaussian noise.
  • Figure 5: Noise filtering efficacy: Noise-corrupted feature activations $\boldsymbol{y}^{(S1)}$ alongside the corresponding output $\boldsymbol{y}^{(S2)}$, in which noise is reduced due to a lack of support within nets.
  • ...and 6 more figures