Table of Contents
Fetching ...

Inferring response times of perceptual decisions with Poisson variational autoencoders

Hayden R. Johnson, Anastasia N. Krouglova, Hadi Vafaii, Jacob L. Yates, Pedro J. Gonçalves

TL;DR

This work introduces an image-computable perceptual decision model, PVAE-RT, that jointly learns efficient spiking representations of high-dimensional stimuli via a Poisson variational autoencoder and executes Bayesian evidence accumulation through a task-optimized decoder. An entropy-based stopping rule yields response times that capture key psychophysical regularities, including stochastic variability, right-skewed distributions, Hick’s law, and speed–accuracy trade-offs, demonstrated on MNIST. By linking efficient sensory coding with probabilistic decision dynamics under biological constraints, the approach provides a principled framework for rendering temporal aspects of perception in neural models and evaluating rapid decision behavior in complex visual tasks.

Abstract

Many properties of perceptual decision making are well-modeled by deep neural networks. However, such architectures typically treat decisions as instantaneous readouts, overlooking the temporal dynamics of the decision process. We present an image-computable model of perceptual decision making in which choices and response times arise from efficient sensory encoding and Bayesian decoding of neural spiking activity. We use a Poisson variational autoencoder to learn unsupervised representations of visual stimuli in a population of rate-coded neurons, modeled as independent homogeneous Poisson processes. A task-optimized decoder then continually infers an approximate posterior over actions conditioned on incoming spiking activity. Combining these components with an entropy-based stopping rule yields a principled and image-computable model of perceptual decisions capable of generating trial-by-trial patterns of choices and response times. Applied to MNIST digit classification, the model reproduces key empirical signatures of perceptual decision making, including stochastic variability, right-skewed response time distributions, logarithmic scaling of response times with the number of alternatives (Hick's law), and speed-accuracy trade-offs.

Inferring response times of perceptual decisions with Poisson variational autoencoders

TL;DR

This work introduces an image-computable perceptual decision model, PVAE-RT, that jointly learns efficient spiking representations of high-dimensional stimuli via a Poisson variational autoencoder and executes Bayesian evidence accumulation through a task-optimized decoder. An entropy-based stopping rule yields response times that capture key psychophysical regularities, including stochastic variability, right-skewed distributions, Hick’s law, and speed–accuracy trade-offs, demonstrated on MNIST. By linking efficient sensory coding with probabilistic decision dynamics under biological constraints, the approach provides a principled framework for rendering temporal aspects of perception in neural models and evaluating rapid decision behavior in complex visual tasks.

Abstract

Many properties of perceptual decision making are well-modeled by deep neural networks. However, such architectures typically treat decisions as instantaneous readouts, overlooking the temporal dynamics of the decision process. We present an image-computable model of perceptual decision making in which choices and response times arise from efficient sensory encoding and Bayesian decoding of neural spiking activity. We use a Poisson variational autoencoder to learn unsupervised representations of visual stimuli in a population of rate-coded neurons, modeled as independent homogeneous Poisson processes. A task-optimized decoder then continually infers an approximate posterior over actions conditioned on incoming spiking activity. Combining these components with an entropy-based stopping rule yields a principled and image-computable model of perceptual decisions capable of generating trial-by-trial patterns of choices and response times. Applied to MNIST digit classification, the model reproduces key empirical signatures of perceptual decision making, including stochastic variability, right-skewed response time distributions, logarithmic scaling of response times with the number of alternatives (Hick's law), and speed-accuracy trade-offs.

Paper Structure

This paper contains 17 sections, 15 equations, 7 figures.

Figures (7)

  • Figure 1: (A)$\mathop{\mathrm{\mathcal{P}}}\nolimits$-VAE-RT architecture. Input stimuli ${\mathbf{x}}$ are processed by a pretrained $\mathop{\mathrm{\mathcal{P}}}\nolimits$-VAE encoder, $\text{enc}_\phi({\mathbf{x}})$, producing a vector of firing rates $\boldsymbol{\lambda}$. These rates generate spike trains via a set of homogeneous Poisson processes. Throughout the spike train, an approximate Bayesian decoder continually infers the posterior distribution, $p_\theta({\mathbf{a}} \mid {\mathbf{z}}_t)$, over actions ${\mathbf{a}}$ based on the accumulated spike count ${\mathbf{z}}_t$. (B) Schematic of the entropy-based stopping rule. Posterior entropy $\mathcal{H}[p_\theta]$ decreases as spikes accumulate. Response times are modeled as the first passage time for the posterior to hit an entropy stopping threshold $\tau$. (C) Schematic of response distributions. We generate response distributions from repeated simulation of actions and response times for a given stimulus.
  • Figure 2: Response distributions depend on task difficulty. Response distributions for stimuli of varying difficulty. Each row corresponds to a digit at a given difficulty level, while each column highlights a property of the response distribution across repeated trials with a fixed stimulus. Easy stimuli are characterized by rapidly decreasing entropy, strongly skewed distributions, and low variance in the action distribution. Difficult stimuli, by contrast, exhibit slower entropy reduction, more symmetric distributions, and greater variability across actions.
  • Figure 3: Hick's Law. (Left) Mean response time from the $\mathop{\mathrm{\mathcal{P}}}\nolimits$-VAE-RT where the decoder is trained to classify among a varying number of MNIST digits. RT increases monotonically with the number of alternatives with an approximately logarithmic trend. (Right) Human response times replotted from Hick hick1952rate, shown in seconds.
  • Figure 4: Speed-accuracy trade-off. (Left) Model accuracy versus average RT as the entropy threshold is swept ($\tau \in \{0.1,\,0.2,\,0.4,\,0.6,\,0.8\}$). Lower $\tau$ values produce longer RTs and higher accuracy. Points represent averages across images and trials. (Right) Human response time data from a working memory task under low versus high time pressure (Heitz & Engle heitz2007focusing).
  • Figure 5: Reconstruction quality of the MNIST dataset. The reconstruction of 25 MNIST images from the test set for $\mathop{\mathrm{\mathcal{P}}}\nolimits$-VAE with varying latent dimensions. Reconstruction improves as latent dimension increases, indicating increased capacity.
  • ...and 2 more figures