Convolution goes higher-order: a biologically inspired mechanism empowers image classification
Simone Azeglio, Olivier Marre, Peter Neri, Ulisse Ferrari
TL;DR
This work introduces higher-order convolutions (HoConv) by embedding Volterra-like polynomial expansions into convolutional layers to capture multiplicative pixel interactions, aiming to model higher-order statistics in natural images. By learning separate per-order feature maps up to the 4th order and stabilizing training with a normalization factor, HoCNNs demonstrate consistent performance gains over standard CNNs across synthetic textures and benchmarks, with 3rd/4th orders often providing the best results. Representational analyses (RSA) and perturbation studies reveal distinct, richer geometries in HoCNN representations and a nuanced sensitivity to higher-order statistics, supporting the biological motivation and suggesting practical benefits for real-world vision tasks. The paper also discusses limitations, notably computational overhead and robustness trade-offs, and outlines future directions including hybrid architectures with transformers and efficiency-focused approaches for broader applicability.
Abstract
We propose a novel approach to image classification inspired by complex nonlinear biological visual processing, whereby classical convolutional neural networks (CNNs) are equipped with learnable higher-order convolutions. Our model incorporates a Volterra-like expansion of the convolution operator, capturing multiplicative interactions akin to those observed in early and advanced stages of biological visual processing. We evaluated this approach on synthetic datasets by measuring sensitivity to testing higher-order correlations and performance in standard benchmarks (MNIST, FashionMNIST, CIFAR10, CIFAR100 and Imagenette). Our architecture outperforms traditional CNN baselines, and achieves optimal performance with expansions up to 3rd/4th order, aligning remarkably well with the distribution of pixel intensities in natural images. Through systematic perturbation analysis, we validate this alignment by isolating the contributions of specific image statistics to model performance, demonstrating how different orders of convolution process distinct aspects of visual information. Furthermore, Representational Similarity Analysis reveals distinct geometries across network layers, indicating qualitatively different modes of visual information processing. Our work bridges neuroscience and deep learning, offering a path towards more effective, biologically inspired computer vision models. It provides insights into visual information processing and lays the groundwork for neural networks that better capture complex visual patterns, particularly in resource-constrained scenarios.
