Table of Contents
Fetching ...

Active Sensing with Predictive Coding and Uncertainty Minimization

Abdelrahman Sharafeldin, Nabil Imam, Hannah Choi

TL;DR

The paper presents a unified, end-to-end differentiable framework that integrates predictive coding-based perception with uncertainty-minimizing action to enable intrinsically driven embodied exploration. It demonstrates two instantiations of the model: a discrete controllable Markov chain setting and a continuous active vision task with band-limited sensing, both trained without external rewards. The perception module learns generative models of the environment via variational inference, while the action module selects informative actions through Bayesian Action Selection that reduces uncertainty, improving sample efficiency and data utilization for downstream classification. This work advances embodied AI by linking perception and action in a scalable, interpretable way and providing code to reproduce the results, with potential applications to more complex real-world perception-action problems.

Abstract

We present an end-to-end procedure for embodied exploration inspired by two biological computations: predictive coding and uncertainty minimization. The procedure can be applied to exploration settings in a task-independent and intrinsically driven manner. We first demonstrate our approach in a maze navigation task and show that it can discover the underlying transition distributions and spatial features of the environment. Second, we apply our model to a more complex active vision task, where an agent actively samples its visual environment to gather information. We show that our model builds unsupervised representations through exploration that allow it to efficiently categorize visual scenes. We further show that using these representations for downstream classification leads to superior data efficiency and learning speed compared to other baselines while maintaining lower parameter complexity. Finally, the modularity of our model allows us to probe its internal mechanisms and analyze the interaction between perception and action during exploration.

Active Sensing with Predictive Coding and Uncertainty Minimization

TL;DR

The paper presents a unified, end-to-end differentiable framework that integrates predictive coding-based perception with uncertainty-minimizing action to enable intrinsically driven embodied exploration. It demonstrates two instantiations of the model: a discrete controllable Markov chain setting and a continuous active vision task with band-limited sensing, both trained without external rewards. The perception module learns generative models of the environment via variational inference, while the action module selects informative actions through Bayesian Action Selection that reduces uncertainty, improving sample efficiency and data utilization for downstream classification. This work advances embodied AI by linking perception and action in a scalable, interpretable way and providing code to reproduce the results, with potential applications to more complex real-world perception-action problems.

Abstract

We present an end-to-end procedure for embodied exploration inspired by two biological computations: predictive coding and uncertainty minimization. The procedure can be applied to exploration settings in a task-independent and intrinsically driven manner. We first demonstrate our approach in a maze navigation task and show that it can discover the underlying transition distributions and spatial features of the environment. Second, we apply our model to a more complex active vision task, where an agent actively samples its visual environment to gather information. We show that our model builds unsupervised representations through exploration that allow it to efficiently categorize visual scenes. We further show that using these representations for downstream classification leads to superior data efficiency and learning speed compared to other baselines while maintaining lower parameter complexity. Finally, the modularity of our model allows us to probe its internal mechanisms and analyze the interaction between perception and action during exploration.
Paper Structure (43 sections, 29 equations, 14 figures, 7 tables, 3 algorithms)

This paper contains 43 sections, 29 equations, 14 figures, 7 tables, 3 algorithms.

Figures (14)

  • Figure 1: Traditional versus biological models of perception and action. (a) we actively sample visual scenes to infer hidden states, in contrast to standard ML models which assume passive perception. (b) biological systems have an intrinsic drive to actively explore the environment and build internal models of it; in contrast, traditional RL models are primarily guided by extrinsic reward.
  • Figure 2: Generative models and architectures for the active exploration agents in (a) CMCs, and (b) active vision. Shaded and unshaded circles represent observed and latent variables, respectively. In (b), $f^{(1)}_{\{enc,\;dec\}}$ refer to the encoder and decoder networks of the lower-level VAE, while $f^{(2)}_{\{enc,\;dec\}}$ refer to those of the higher-level VAE. Plate notation is used for the parts that are repeated for every time step $t$ up to the total number of allowed fixations $T$.
  • Figure 3: (a) Foveation setup for the bandlimited sensor in the active vision task. (b) Examples from the translated MNIST dataset used in our evaluations.
  • Figure 4: Results of our active exploration model in the maze environment. (a) Missing information and percent state-action space coverage for a 6x6 maze. (b) example visitation frequency maps for a $6 \times 6$ maze explored by BAS (our model) versus a random exploration strategy. Both agents were allowed to run in the environment for a 1000 time steps. Visitation frequencies are normalized by the maximum visitation frequency in each case.
  • Figure 5: Demonstrating the generative ability of the perception model and its influence on action selection. (a) Original patches of input images (left) and their reconstructions (middle). After the model infers an abstract representation, it is able to generate an imagined digit at the unobserved locations (right). (b) Fixation sequences generated using BAS (left column) and random strategies (right column).
  • ...and 9 more figures