Table of Contents
Fetching ...

Pattern theory: the mathematics of perception

David Mumford

TL;DR

Pattern Theory advances a unifying mathematical framework for perception rooted in Bayesian inference and graphical models, extending from simple HMMs to rich continuous and geometric representations. It shows how MRFs, BBP, and continuum models capture grouping, segmentation, and shape in sensory data, and surveys computational tools such as particle filtering, variational methods, and diffusion-based approaches. The work highlights challenges from non-Markov dependencies and heavy-tailed statistics, while connecting statistical perception to geometric shape analysis via diffeomorphisms and geodesic flows. Overall, it argues for a scalable, pattern-driven paradigm that could enable fully unsupervised learning and robust perception across speech and vision domains.

Abstract

Is there a mathematical theory underlying intelligence? Control theory addresses the output side, motor control, but the work of the last 30 years has made clear that perception is a matter of Bayesian statistical inference, based on stochastic models of the signals delivered by our senses and the structures in the world producing them. We will start by sketching the simplest such model, the hidden Markov model for speech, and then go on illustrate the complications, mathematical issues and challenges that this has led to.

Pattern theory: the mathematics of perception

TL;DR

Pattern Theory advances a unifying mathematical framework for perception rooted in Bayesian inference and graphical models, extending from simple HMMs to rich continuous and geometric representations. It shows how MRFs, BBP, and continuum models capture grouping, segmentation, and shape in sensory data, and surveys computational tools such as particle filtering, variational methods, and diffusion-based approaches. The work highlights challenges from non-Markov dependencies and heavy-tailed statistics, while connecting statistical perception to geometric shape analysis via diffeomorphisms and geodesic flows. Overall, it argues for a scalable, pattern-driven paradigm that could enable fully unsupervised learning and robust perception across speech and vision domains.

Abstract

Is there a mathematical theory underlying intelligence? Control theory addresses the output side, motor control, but the work of the last 30 years has made clear that perception is a matter of Bayesian statistical inference, based on stochastic models of the signals delivered by our senses and the structures in the world producing them. We will start by sketching the simplest such model, the hidden Markov model for speech, and then go on illustrate the complications, mathematical issues and challenges that this has led to.

Paper Structure

This paper contains 15 sections, 26 equations, 6 figures.

Figures (6)

  • Figure 1: Why is this old man recognizable from a cursory glance? His outline threads a complex path amongst the cluttered background and is broken up by alternating highlights and shadows and by the wrinkles on his coat. There is no single part of this image which suggests a person unambiguously (the ear comes closest but the rest of his face can only be guessed at). No other object in the image stands out --- the man's cap, for instance, could be virtually anything. Statistical methods, first grouping contours, secondly guessing at likely illumination effects and finally using probable models of clothes may draw him out. No known computer algorithm comes close to finding a man in this image.
  • Figure 2: Work of Blake and Isard tracking three faces in a moving image sequence. The curves represent estimates of the posterior probability distributions for faces at each location obtained by smoothing the weighted sum of delta functions at the 'particles'. Note how multi-modal these are and how the tracker recovers from the temporary occlusion of one face by another.
  • Figure 3: Grouping in language and vision: On top, parsing the not quite grammatical speech of a 2 1/2 year old Helen describing her own intentions ([H]): above the sentence, a context-free parse tree; below it, longer range non-Markov links --- the identity 'cake'='some'='it' and the unification of the two parts 'Helen's going to' = '(I) am going to'. On the bottom, 2 kinds of grouping with an iso-intensity contour of the image in Figure 1: note the broken but visible contour of the back marked by 'A' and the occluded contours marked by 'B' and 'C' behind the man.
  • Figure 4: Statistical mechanics can be applied to the segmentation of images. On the top left, a rural scene taken as the external magnetic field, with its intensity scaled so that dark areas are negative, light areas are positive. At the top right, the mode or ground state of the Ising model. Along the bottom, the Gibbs distribution is sampled at a decreasing sequence of temperatures, discovering the global pattern bit by bit.
  • Figure 5: On the left, an image of the texture of a Cheetah's hide, in the middle a synthetic image from the Gaussian model with the same second order statistics, on the right a synthetic image in which the full distribution on 7 filter statistics are reproduced by an exponential model.
  • ...and 1 more figures