Table of Contents
Fetching ...

Convolution goes higher-order: a biologically inspired mechanism empowers image classification

Simone Azeglio, Olivier Marre, Peter Neri, Ulisse Ferrari

TL;DR

This work introduces higher-order convolutions (HoConv) by embedding Volterra-like polynomial expansions into convolutional layers to capture multiplicative pixel interactions, aiming to model higher-order statistics in natural images. By learning separate per-order feature maps up to the 4th order and stabilizing training with a normalization factor, HoCNNs demonstrate consistent performance gains over standard CNNs across synthetic textures and benchmarks, with 3rd/4th orders often providing the best results. Representational analyses (RSA) and perturbation studies reveal distinct, richer geometries in HoCNN representations and a nuanced sensitivity to higher-order statistics, supporting the biological motivation and suggesting practical benefits for real-world vision tasks. The paper also discusses limitations, notably computational overhead and robustness trade-offs, and outlines future directions including hybrid architectures with transformers and efficiency-focused approaches for broader applicability.

Abstract

We propose a novel approach to image classification inspired by complex nonlinear biological visual processing, whereby classical convolutional neural networks (CNNs) are equipped with learnable higher-order convolutions. Our model incorporates a Volterra-like expansion of the convolution operator, capturing multiplicative interactions akin to those observed in early and advanced stages of biological visual processing. We evaluated this approach on synthetic datasets by measuring sensitivity to testing higher-order correlations and performance in standard benchmarks (MNIST, FashionMNIST, CIFAR10, CIFAR100 and Imagenette). Our architecture outperforms traditional CNN baselines, and achieves optimal performance with expansions up to 3rd/4th order, aligning remarkably well with the distribution of pixel intensities in natural images. Through systematic perturbation analysis, we validate this alignment by isolating the contributions of specific image statistics to model performance, demonstrating how different orders of convolution process distinct aspects of visual information. Furthermore, Representational Similarity Analysis reveals distinct geometries across network layers, indicating qualitatively different modes of visual information processing. Our work bridges neuroscience and deep learning, offering a path towards more effective, biologically inspired computer vision models. It provides insights into visual information processing and lays the groundwork for neural networks that better capture complex visual patterns, particularly in resource-constrained scenarios.

Convolution goes higher-order: a biologically inspired mechanism empowers image classification

TL;DR

This work introduces higher-order convolutions (HoConv) by embedding Volterra-like polynomial expansions into convolutional layers to capture multiplicative pixel interactions, aiming to model higher-order statistics in natural images. By learning separate per-order feature maps up to the 4th order and stabilizing training with a normalization factor, HoCNNs demonstrate consistent performance gains over standard CNNs across synthetic textures and benchmarks, with 3rd/4th orders often providing the best results. Representational analyses (RSA) and perturbation studies reveal distinct, richer geometries in HoCNN representations and a nuanced sensitivity to higher-order statistics, supporting the biological motivation and suggesting practical benefits for real-world vision tasks. The paper also discusses limitations, notably computational overhead and robustness trade-offs, and outlines future directions including hybrid architectures with transformers and efficiency-focused approaches for broader applicability.

Abstract

We propose a novel approach to image classification inspired by complex nonlinear biological visual processing, whereby classical convolutional neural networks (CNNs) are equipped with learnable higher-order convolutions. Our model incorporates a Volterra-like expansion of the convolution operator, capturing multiplicative interactions akin to those observed in early and advanced stages of biological visual processing. We evaluated this approach on synthetic datasets by measuring sensitivity to testing higher-order correlations and performance in standard benchmarks (MNIST, FashionMNIST, CIFAR10, CIFAR100 and Imagenette). Our architecture outperforms traditional CNN baselines, and achieves optimal performance with expansions up to 3rd/4th order, aligning remarkably well with the distribution of pixel intensities in natural images. Through systematic perturbation analysis, we validate this alignment by isolating the contributions of specific image statistics to model performance, demonstrating how different orders of convolution process distinct aspects of visual information. Furthermore, Representational Similarity Analysis reveals distinct geometries across network layers, indicating qualitatively different modes of visual information processing. Our work bridges neuroscience and deep learning, offering a path towards more effective, biologically inspired computer vision models. It provides insights into visual information processing and lays the groundwork for neural networks that better capture complex visual patterns, particularly in resource-constrained scenarios.

Paper Structure

This paper contains 38 sections, 3 equations, 14 figures, 9 tables.

Figures (14)

  • Figure 1: Extending classical convolution (A) Implementation of higher-order convolution: Input patch is flattened to vector x. 1st order uses classical convolution with weight-sharing. 2nd and 3rd orders compute outer products - respectively one and two times - of x with itself and dot product with the corresponding weights. Feature maps are then summed before ReLU activation. (B) Cumulative explained variance of principal components for standard CNN model vs Higher-order CNN models with 2nd and 3rd expansion quantitatively confirms the tied-weight issue for classical models. Inset: Example of the 32×32 binary texture image used in this analysis, containing all possible 1-, 2-, 3-, and 4-point correlations for 2×2 patches.
  • Figure 2: Multipoint correlations and glider classification (A) Textures generated with N-point gliders (N ranging from 1 to 4), totaling 10 classes when taking into account parity constraints. (B) Confusion matrices for different models, from top left: baseline CNN; top right: Higher-order CNN (HoCNN) with kernels expanded up to the 2nd order; bottom-left: HoCNN with kernels expanded up to the 3rd order; bottom-right: HoCNN with kernels expanded up to the 4th order. Taken together the four confusion matrices show that higher-orders progressively allow our network to properly capture relevant features for image classification.
  • Figure 3: (A) Comparison of standard convolution (Conv) and (B) higher-order convolution (HoConv) blocks.
  • Figure 4: Perturbation Analysis and Neural Representations. (A) An exemplar test image (CIFAR-10) and perturbed examples with different 1, 2, 3, 4-point correlations statistics and common intensity (I = 0.12). (B) Normalized (wrt unperturbed CIFAR-10) accuracy for Convolutional (Blue) vs Higher-order (Orange) networks: performances of Higher-order network are systematically worse for more structured perturbations at fixed intensity (I = 0.12). (C) Representational Dissimilarity Matrix (RDM) for baseline Convolutional Block (Conv layer, Batch Norm, ReLU, Pooling). (D) RDM for the Higher-order Convolutional block. (E) Log Ratio between the two RDMs, capturing different representational geometries between the two blocks.
  • Figure 5: Validation accuracy curves for CNN and HoCNN on image classification benchmarks. Learning curves comparing the performance of the baseline CNN and the proposed HoCNN on MNIST, FashionMNIST, CIFAR-10, and CIFAR-100 datasets. The HoCNN consistently outperforms the CNN across all benchmarks, demonstrating the effectiveness of incorporating higher-order interactions in the convolutional layers.
  • ...and 9 more figures