Table of Contents
Fetching ...

Meta-Representational Predictive Coding: Biomimetic Self-Supervised Learning

Alexander Ororbia, Karl Friston, Rajesh P. N. Rao

TL;DR

Self-supervised learning often relies on backpropagation or pixel-level generation, which is biologically implausible. The authors propose meta-representational predictive coding (MPC), an encoder-centric predictive coding framework that learns distributed latent representations by predicting latent activity across multiple visual streams (foveal, parafoveal, peripheral) driven by sequence-based glimpses. MPC uses local Hebbian plasticity and cross-stream prediction to minimize a combined free energy without reconstructing inputs, and shows competitive downstream performance on MNIST and Kuzushiji-MNIST with good sample efficiency and meaningful latent structure. This biomimetic approach bridges neuroscience-inspired inference and scalable SSL, with potential extensions to active perception and multimodal self-supervision.

Abstract

Self-supervised learning has become an increasingly important paradigm in the domain of machine intelligence. Furthermore, evidence for self-supervised adaptation, such as contrastive formulations, has emerged in recent computational neuroscience and brain-inspired research. Nevertheless, current work on self-supervised learning relies on biologically implausible credit assignment -- in the form of backpropagation of errors -- and feedforward inference, typically a forward-locked pass. Predictive coding, in its mechanistic form, offers a biologically plausible means to sidestep these backprop-specific limitations. However, unsupervised predictive coding rests on learning a generative model of raw pixel input (akin to ``generative AI'' approaches), which entails predicting a potentially high dimensional input; on the other hand, supervised predictive coding, which learns a mapping between inputs to target labels, requires human annotation, and thus incurs the drawbacks of supervised learning. In this work, we present a scheme for self-supervised learning within a neurobiologically plausible framework that appeals to the free energy principle, constructing a new form of predictive coding that we call meta-representational predictive coding (MPC). MPC sidesteps the need for learning a generative model of sensory input (e.g., pixel-level features) by learning to predict representations of sensory input across parallel streams, resulting in an encoder-only learning and inference scheme. This formulation rests on active inference (in the form of sensory glimpsing) to drive the learning of representations, i.e., the representational dynamics are driven by sequences of decisions made by the model to sample informative portions of its sensorium.

Meta-Representational Predictive Coding: Biomimetic Self-Supervised Learning

TL;DR

Self-supervised learning often relies on backpropagation or pixel-level generation, which is biologically implausible. The authors propose meta-representational predictive coding (MPC), an encoder-centric predictive coding framework that learns distributed latent representations by predicting latent activity across multiple visual streams (foveal, parafoveal, peripheral) driven by sequence-based glimpses. MPC uses local Hebbian plasticity and cross-stream prediction to minimize a combined free energy without reconstructing inputs, and shows competitive downstream performance on MNIST and Kuzushiji-MNIST with good sample efficiency and meaningful latent structure. This biomimetic approach bridges neuroscience-inspired inference and scalable SSL, with potential extensions to active perception and multimodal self-supervision.

Abstract

Self-supervised learning has become an increasingly important paradigm in the domain of machine intelligence. Furthermore, evidence for self-supervised adaptation, such as contrastive formulations, has emerged in recent computational neuroscience and brain-inspired research. Nevertheless, current work on self-supervised learning relies on biologically implausible credit assignment -- in the form of backpropagation of errors -- and feedforward inference, typically a forward-locked pass. Predictive coding, in its mechanistic form, offers a biologically plausible means to sidestep these backprop-specific limitations. However, unsupervised predictive coding rests on learning a generative model of raw pixel input (akin to ``generative AI'' approaches), which entails predicting a potentially high dimensional input; on the other hand, supervised predictive coding, which learns a mapping between inputs to target labels, requires human annotation, and thus incurs the drawbacks of supervised learning. In this work, we present a scheme for self-supervised learning within a neurobiologically plausible framework that appeals to the free energy principle, constructing a new form of predictive coding that we call meta-representational predictive coding (MPC). MPC sidesteps the need for learning a generative model of sensory input (e.g., pixel-level features) by learning to predict representations of sensory input across parallel streams, resulting in an encoder-only learning and inference scheme. This formulation rests on active inference (in the form of sensory glimpsing) to drive the learning of representations, i.e., the representational dynamics are driven by sequences of decisions made by the model to sample informative portions of its sensorium.

Paper Structure

This paper contains 17 sections, 13 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Illustration of two consecutive observations of an image. Shown is one of the digits processed by the MPC scheme over the course of two consecutive saccade-produced glimpses. On the left is the full source image with a red dashed box showing the approximate subspace sampled by the glimpse. The right panels, within the expanded dot-dashed rectangle, show how the sampled data within the dashed box on the left is converted into six input representations, i.e., four overlapping "fovea patches", a "parafovea patch", and a "peripheral patch".
  • Figure 2: Illustration of the message passing in MPC. For generative predictive coding (GPC) and meta-representational predictive coding (MPC), depicted is: (a) the flow/directional pattern of predictions made (solid blue arrows, which indicate neuronal populations that produce a prediction) in GPC versus MPC, and (b) the flow/direction of message passing (dashed black arcs, which indicate feedback pathways that carry prediction errors) that result from GPC versus MPC prediction patterns (in sub-figure a). Solid gray boxes indicate neuronal populations encoding latent states, while green diamonds indicate populations of error neurons. Both types of PC represent the same number of latent states; the MPC shown is an architecture of two streams where stream "A" is shown processing foveal sensory information and stream "B" processes peripheral sensory information.
  • Figure 3: Graphical depiction of a simple dual stream, MPC architecture. A meta-representational predictive coding (MPC) architecture works by processing inputs via two or more features of the sensory input, generally at different resolutions which mimic the coarseness (acuity) of the spatial features extracted by the foveal/parafoveal and the peripheral streams of the human eye. In this image, we depict a two stream architecture, where one MPC stream produces a foveal/central representation $\mathbf{z}^{3,1}(t)$ (in its third layer) of its input at time $t$ while another circuit produces a peripheral representation $\mathbf{z}^{3,2}(t)$ of the input (also at $t$). The foveal MPC stream attempts to predict the activities of the peripheral MPC circuit and vice versa (for the $k$-th glimpse at an image). Notice that all MPC streams are conditioned by the actions, i.e., normalized x-y coordinates of the fixation point of all of the foveal/parafoveal/peripheral views, taken by a saccade over the sensory input as well as possibly their prior expectation (at time $t-1$). In this work, a fixed $K$-length saccade sequence is produced by randomly jumping across the sensory space, resulting in a perceptual input sampling policy. Green diamonds indicate error neuron populations, light-gray or orange circles with slightly darker colors within denote state cell populations, light gray arrows represent synaptic connections, dashed black circular arcs depict recurrent synapses, and blue dash-dotted arcs denote lateral cross-circuit prediction synapses (not shown, to improve visual clarity, are feedback synapses).
  • Figure 4: Structures of generative and meta-representational predictive coding schemes. Depicted are two proposed variants of predictive coding -- a "field-of-view" form of generative predictive coding (GPC-fov) and meta-representational predictive coding (MPC) -- that process the same visual scene. In this graphical example, two portions of visual input at a particular point in time are extracted (yielding a dual view, possibly containing foveal, parafoveal, or peripheral patch pixel information) by an eye movement process (represented by the pale green fat arrows), such as the involuntary saccades described in Section \ref{['sec:sensory_saccades']}. Specifically, we show: (a) the proposed GPC-fov (with two neural columns) processing a dual view of the sensory input, i.e., a variant of GPC that uses the same information as our MPC models; and (b) the proposed MPC (with two neural columns or streams) processing a dual view of the sensory input. Note that solid (black) arrows with open circles denote inhibitory (predictive) synapses, dashed (black) arrows with solid squares denote a population of excitatory (message-passing) synapses, purple boxes indicate a population of neurons encoding latent states and green diamonds denote a group of error neurons for a specific layer. In the zoomed-in inset for sub-Figure \ref{['fig:rpc_circuit_mp']}, we show the incoming and outgoing wired connections to a single neuron within a population.
  • Figure 5: Visualization of different MPC cross-circuit prediction patterns experimented with. Above are shown three possible prediction schemes for how the individual streams interact with one another; dashed blue arrows indicate a prediction direction (blue arrow head ends on prediction target) from which error messages flow backwards. Note that each "Fovea", "Parafovea", and "Peripheral" box corresponds to a particular MPC stream (from a top-down view).
  • ...and 5 more figures