Explicitly Modeling Pre-Cortical Vision with a Neuro-Inspired Front-End Improves CNN Robustness

Lucas Piper; Arlindo L. Oliveira; Tiago Marques

Explicitly Modeling Pre-Cortical Vision with a Neuro-Inspired Front-End Improves CNN Robustness

Lucas Piper, Arlindo L. Oliveira, Tiago Marques

TL;DR

Two novel biologically-inspired CNN model families that incorporate a new front-end block designed to simulate pre-cortical visual processing are introduced, showing that simulating multiple stages of early visual processing in CNN early layers provides cumulative benefits for model robustness.

Abstract

While convolutional neural networks (CNNs) excel at clean image classification, they struggle to classify images corrupted with different common corruptions, limiting their real-world applicability. Recent work has shown that incorporating a CNN front-end block that simulates some features of the primate primary visual cortex (V1) can improve overall model robustness. Here, we expand on this approach by introducing two novel biologically-inspired CNN model families that incorporate a new front-end block designed to simulate pre-cortical visual processing. RetinaNet, a hybrid architecture containing the novel front-end followed by a standard CNN back-end, shows a relative robustness improvement of 12.3% when compared to the standard model; and EVNet, which further adds a V1 block after the pre-cortical front-end, shows a relative gain of 18.5%. The improvement in robustness was observed for all the different corruption categories, though accompanied by a small decrease in clean image accuracy, and generalized to a different back-end architecture. These findings show that simulating multiple stages of early visual processing in CNN early layers provides cumulative benefits for model robustness.

Explicitly Modeling Pre-Cortical Vision with a Neuro-Inspired Front-End Improves CNN Robustness

TL;DR

Abstract

Paper Structure (34 sections, 3 equations, 4 figures, 2 tables)

This paper contains 34 sections, 3 equations, 4 figures, 2 tables.

Introduction
Related Work
Retina modeling.
Common corruptions.
Neuro-inspired models.
Methods
Push-pull pattern.
DoG convolution.
Light adaptation.
Contrast normalization.
Results
The RetinaBlock simulates empirical retinal ganglion cell response properties
RetinaNets improve robustness against corruptions
The RetinaBlock-VOneBlock interaction provides cumulative robustness gains
Discussion
...and 19 more sections

Figures (4)

Figure 1: Simulating early visual processing of primates as CNN front-end blocks.A The RetinaBlock integrates a light-adaptation layer, a DoG convolutional layer with color-opponent pathways for migdet cells, and, for parasol-cells, a contrast-normalization layer. B VOneNet, RetinaNet and EVNet comprise an initial block designed to simulate a specific stage of the visual system, followed by a standard CNN architecture. The VOneNet includes the VOneBlock; the RetinaNet includes the RetinaBlock; and the EVNet includes both.
Figure 2: RetinaBlock simulates retinal response properties to SF and contrast.A Contrast sensitivity curves of example midget and parasol cells of the RetinaBlock with corresponding stimuli below. Arrows denote where logarithmic saturation begins by fitting a log contrast response function RaghavanENEURO.0515-22.2023. B SF tuning curves with the corresponding grating stimuli below. Activation range differs across cell types due to the compression introduced by the contrast-normalization layer. The optimal SF is 4.2 cycles per degree (cpd) for midget cells and 1.0 cpd for parasol cells. C Distribution of optimal SF for VOneBlock cells with and without prior RetinaBlock processing.
Figure 3: RetinaNets improve robustness to all corruption categories and EVNets further improve upon VOneNets and RetinaNets.A Relative accuracy (normalized by ResNet18 accuracy) on clean images and all corruptions categories for the base ResNet18, VOneResNet18, RetinaResNet18 and EVResNet18 (see Table \ref{['tab:abs_acc_corr']} and Figure \ref{['fig:A1']} for absolute accuracies). Bars represent the mean and error bars represent the SE ($n$ = 4 seed initializations). B Relative accuracy (normalized by VGG16) on clean images and all corruptions categories for models based in the VGG16 architecture (absolute accuracies in Table \ref{['tab:abs_acc_vgg16']}).
Figure 4: Absolute top-1 accuracies of ResNet18, VOneResNet18, RetinaResNet18 and EVResNet18 for 15 corruption types at 5 perturbation severity levels. Lines represent the mean and error bars represent the standard error of the mean ($n$ = 4)

Explicitly Modeling Pre-Cortical Vision with a Neuro-Inspired Front-End Improves CNN Robustness

TL;DR

Abstract

Explicitly Modeling Pre-Cortical Vision with a Neuro-Inspired Front-End Improves CNN Robustness

Authors

TL;DR

Abstract

Table of Contents

Figures (4)