Table of Contents
Fetching ...

Hybrid Lie semi-group and cascade structures for the generalized Gaussian derivative model for visual receptive fields

Tony Lindeberg

TL;DR

This work tackles the problem of variability in visual image structures caused by viewing-condition–induced geometric transformations by formulating covariant receptive fields through a multi-parameter, generalized Gaussian derivative model. It develops two complementary theoretical strands: (i) infinitesimal relations that resemble hybrid Lie semi-group generators for spatial and spatio-temporal smoothing, expressed via generalized Hermite polynomials, and (ii) macroscopic cascade smoothing relations that describe how coarse-scale receptive field responses can be computed from finer-scale responses. The results cover purely spatial, isotropic spatio-temporal, and affine spatio-temporal smoothing, including time-causal variants via the time-causal limit kernel, and provide explicit parameter-transform relations and incremental kernels for cascade implementations. These theoretical contributions enable efficient bank-based computation of covariant receptive fields and offer foundational insights for modeling simple-cell computations in biological vision, with potential implications for geometric deep learning and robust visual processing across viewing conditions.

Abstract

Because of the variabilities of real-world image structures under the natural image transformations that arise when observing similar objects or spatio-temporal events under different viewing conditions, the receptive field responses computed in the earliest layers of the visual hierarchy may be strongly influenced by such geometric image transformations. One way of handling this variability is by basing the vision system on covariant receptive field families, which expand the receptive field shapes over the degrees of freedom in the image transformations. This paper addresses the problem of deriving relationships between spatial and spatio-temporal receptive field responses obtained for different values of the shape parameters in the resulting multi-parameter families of receptive fields. For this purpose, we derive both (i) infinitesimal relationships, roughly corresponding to a combination of notions from semi-groups and Lie groups, as well as (ii) macroscopic cascade smoothing properties, which describe how receptive field responses at coarser spatial and temporal scales can be computed by applying smaller support incremental filters to the output from corresponding receptive fields at finer spatial and temporal scales, structurally related to the notion of Lie algebras, although with directional preferences. The presented results provide (i) a deeper understanding of the relationships between spatial and spatio-temporal receptive field responses for different values of the filter parameters, which can be used for both (ii) designing more efficient schemes for computing receptive field responses over populations of multi-parameter families of receptive fields, as well as (iii)~formulating idealized theoretical models of the computations of simple cells in biological vision.

Hybrid Lie semi-group and cascade structures for the generalized Gaussian derivative model for visual receptive fields

TL;DR

This work tackles the problem of variability in visual image structures caused by viewing-condition–induced geometric transformations by formulating covariant receptive fields through a multi-parameter, generalized Gaussian derivative model. It develops two complementary theoretical strands: (i) infinitesimal relations that resemble hybrid Lie semi-group generators for spatial and spatio-temporal smoothing, expressed via generalized Hermite polynomials, and (ii) macroscopic cascade smoothing relations that describe how coarse-scale receptive field responses can be computed from finer-scale responses. The results cover purely spatial, isotropic spatio-temporal, and affine spatio-temporal smoothing, including time-causal variants via the time-causal limit kernel, and provide explicit parameter-transform relations and incremental kernels for cascade implementations. These theoretical contributions enable efficient bank-based computation of covariant receptive fields and offer foundational insights for modeling simple-cell computations in biological vision, with potential implications for geometric deep learning and robust visual processing across viewing conditions.

Abstract

Because of the variabilities of real-world image structures under the natural image transformations that arise when observing similar objects or spatio-temporal events under different viewing conditions, the receptive field responses computed in the earliest layers of the visual hierarchy may be strongly influenced by such geometric image transformations. One way of handling this variability is by basing the vision system on covariant receptive field families, which expand the receptive field shapes over the degrees of freedom in the image transformations. This paper addresses the problem of deriving relationships between spatial and spatio-temporal receptive field responses obtained for different values of the shape parameters in the resulting multi-parameter families of receptive fields. For this purpose, we derive both (i) infinitesimal relationships, roughly corresponding to a combination of notions from semi-groups and Lie groups, as well as (ii) macroscopic cascade smoothing properties, which describe how receptive field responses at coarser spatial and temporal scales can be computed by applying smaller support incremental filters to the output from corresponding receptive fields at finer spatial and temporal scales, structurally related to the notion of Lie algebras, although with directional preferences. The presented results provide (i) a deeper understanding of the relationships between spatial and spatio-temporal receptive field responses for different values of the filter parameters, which can be used for both (ii) designing more efficient schemes for computing receptive field responses over populations of multi-parameter families of receptive fields, as well as (iii)~formulating idealized theoretical models of the computations of simple cells in biological vision.

Paper Structure

This paper contains 31 sections, 106 equations, 8 figures.

Figures (8)

  • Figure 1: Purely spatial receptive fields in terms of directional derivatives $\partial_{\varphi}^m$ of affine Gaussian kernels $g(x;\; s, \Sigma)$ of the form (\ref{['eq-gauss-fcn-2D']}) for orders $m = 1$ and $m = 2$, shown for different combinations of the spatial scale parameters $\sigma_1$ and $\sigma_2$, corresponding to two different eccentricities $\epsilon = \sigma_2/\sigma_1 \in \{1, 2 \}$ of the receptive fields, according to the explicit parameterization of the affine Gaussian kernels according to (\ref{['eq-expl-par-Cxx']})--(\ref{['eq-expl-par-Cyy']}) and (\ref{['eq-def-aff-gauss-cont']})--(\ref{['eq-def-aff-gauss-cont-arg']}). (Horizontal axes: Horizontal image coordinate $x_1 \in [-10, 10]$. Vertical axes: Vertical image coordinate $x_2 \in [-10, 10]$.)
  • Figure 2: Non-causal joint spatio-temporal receptive fields over a 1+1D spatio-temporal domain in terms of the mixed first-order spatial derivative and the first-order velocity-adapted temporal derivative of the form $T_{x\bar{t}}(x, t;\; s, \tau, v)$ according to (\ref{['eq-mixed-strf-1-1']}) as the product of a velocity-adapted 1-D Gaussian kernel $g_{1D}(x - v \, t;\; s)$ over the spatial domain and the non-causal Gaussian kernel $h(t;\; \tau) = g(t;\; \tau)$ over the temporal domain according to (\ref{['eq-non-caus-temp-gauss']}). The spatio-temporal receptive fields are shown for different values of the spatial scale parameter $\sigma_x = \sqrt{s}$ and the temporal scale parameter $\sigma_t = \sqrt{\tau}$ in dimensions of $[\hbox{length}]$ and $[\hbox{time}]$. (Horizontal axes: Spatial image coordinate $x \in [-10, 10]$. Vertical axes: Temporal variable $t \in [-4, 4]$.)
  • Figure 4: Illustration of the geometry underlying the composed locally linearized projection models in Equation (\ref{['eq-spat-geom-img-transf']}) and Equations (\ref{['eq-spattemp-geom-img-transf']})--(\ref{['eq-t-transf']}), when extended to a multi-view imaging situation, with each view indexed by $k$. We consider a local, in the spatio-temporal case possibly moving, surface patch, which is projected to an arbitrary view in a multi-view locally linearized projection model, with the fixation point $F$ on the surface mapped to the origin $O^{(k)} = 0$ in the image plane for the observer with the optic center $P^{(k)}$. Then, any point in the tangent plane to the surface at the fixation point, as parameterized by the local coordinates $\xi$ in a coordinate frame attached to the tangent plane of the surface with $\xi = 0$ at the fixation point $F$, is by the local linearization mapped to the image point $x^{(k)}$. (Figure reproduced from Lindeberg (Lin25-JMIV) with permission (OpenAccess).)
  • Figure 5: Schematic illustration of how receptive fields for different parameter values and at different image positions can be interrelated to each other, as constituting the main subject of study in this paper. Here, we derive such relationships between receptive field responses obtained for different values of the shape parameters of the receptive fields, either in terms of (i) infinitesimal relationships closely related to the notions of Lie groups and infinitesimal generators for semi-groups, and (ii) macroscopic relationships in terms of cascade smoothing properties, with close relations to the notion of Lie algebras, as can be related to Lie groups via the exponential map. Compared to regular Lie groups and Lie algebras, some of the evolution relations do, however, have a directional preference, implying that the evolution can only be performed in a single direction, and not in the reverse direction. (In this figure, each line represents a connection between two receptive fields for different values of their parameters and/or their spatial positions.)
  • Figure 6: Generalized Hermite polynomials as arising from derivatives of the purely spatial affine Gaussian kernel $T(x;\; s, \Sigma) = g(x;\; s, \Sigma)$ according to (\ref{['eq-gauss-fcn-2D']}) with respect the image coordinates $x = (x_1, x_2)^T$ up to order 2, as well as with respect to the spatial scale parameter $s$ and the elements $\Sigma_{11}$, $\Sigma_{12}$ and $\Sigma_{22}$ of the spatial covariance matrix $\Sigma$.
  • ...and 3 more figures