Table of Contents
Fetching ...

pAE: An Efficient Autoencoder Architecture for Modeling the Lateral Geniculate Nucleus by Integrating Feedforward and Feedback Streams in Human Visual System

Moslem Gorji, Amin Ranjbar, Mohammad Bagher Menhaj

TL;DR

A deep convolutional model that closely approximates human visual information processing is introduced and the proposed method based on the deep-tuned model not only achieves results with high similarity in comparison with human benchmarks but also performs significantly better than other models.

Abstract

The visual cortex is a vital part of the brain, responsible for hierarchically identifying objects. Understanding the role of the lateral geniculate nucleus (LGN) as a prior region of the visual cortex is crucial when processing visual information in both bottom-up and top-down pathways. When visual stimuli reach the retina, they are transmitted to the LGN area for initial processing before being sent to the visual cortex for further processing. In this study, we introduce a deep convolutional model that closely approximates human visual information processing. We aim to approximate the function for the LGN area using a trained shallow convolutional model which is designed based on a pruned autoencoder (pAE) architecture. The pAE model attempts to integrate feed forward and feedback streams from/to the V1 area into the problem. This modeling framework encompasses both temporal and non-temporal data feeding modes of the visual stimuli dataset containing natural images captured by a fixed camera in consecutive frames, featuring two categories: images with animals (in motion), and images without animals. Subsequently, we compare the results of our proposed deep-tuned model with wavelet filter bank methods employing Gabor and biorthogonal wavelet functions. Our experiments reveal that the proposed method based on the deep-tuned model not only achieves results with high similarity in comparison with human benchmarks but also performs significantly better than other models. The pAE model achieves the final 99.26% prediction performance and demonstrates a notable improvement of around 28% over human results in the temporal mode.

pAE: An Efficient Autoencoder Architecture for Modeling the Lateral Geniculate Nucleus by Integrating Feedforward and Feedback Streams in Human Visual System

TL;DR

A deep convolutional model that closely approximates human visual information processing is introduced and the proposed method based on the deep-tuned model not only achieves results with high similarity in comparison with human benchmarks but also performs significantly better than other models.

Abstract

The visual cortex is a vital part of the brain, responsible for hierarchically identifying objects. Understanding the role of the lateral geniculate nucleus (LGN) as a prior region of the visual cortex is crucial when processing visual information in both bottom-up and top-down pathways. When visual stimuli reach the retina, they are transmitted to the LGN area for initial processing before being sent to the visual cortex for further processing. In this study, we introduce a deep convolutional model that closely approximates human visual information processing. We aim to approximate the function for the LGN area using a trained shallow convolutional model which is designed based on a pruned autoencoder (pAE) architecture. The pAE model attempts to integrate feed forward and feedback streams from/to the V1 area into the problem. This modeling framework encompasses both temporal and non-temporal data feeding modes of the visual stimuli dataset containing natural images captured by a fixed camera in consecutive frames, featuring two categories: images with animals (in motion), and images without animals. Subsequently, we compare the results of our proposed deep-tuned model with wavelet filter bank methods employing Gabor and biorthogonal wavelet functions. Our experiments reveal that the proposed method based on the deep-tuned model not only achieves results with high similarity in comparison with human benchmarks but also performs significantly better than other models. The pAE model achieves the final 99.26% prediction performance and demonstrates a notable improvement of around 28% over human results in the temporal mode.
Paper Structure (22 sections, 6 equations, 14 figures, 1 table)

This paper contains 22 sections, 6 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: The schematic of both biological and artificial visual processing systems in humans. The visual information processing initiates in the LGN area where the input data is crucially prepared for further processing through the ventral pathway in late visual areas, specifically from V2 to the IT. The developed model aims to mimic the biological system and predicts labels that closely resemble the true labels.
  • Figure 2: Captured image frames depicting animal and non-animal movements. The upper two rows display two series of three consecutive frames, each capturing an animal in motion. In contrast, the lower two rows showcase two sets of three consecutive frames capturing movements of non-animal objects such as leaves, sea, waterfall, humans, etc.
  • Figure 3: Proposed modeling framework for processing visual data in LGN and V1 areas. This model demonstrates the processing of the initial image by the LGN area for edge detection in V1. The image is then reconstructed for further computation of difference with the second consecutive frame for image classification, simulating the ventral visual cortex.
  • Figure 4: The autoencoder model approximates the LGN-V1 connection using the pAE model. This model builds a more simplified architecture that mimics the actions of the visual cortex's first layer, or V1. Using a single convolutional layer as an encoder and a deconvolutional layer that functions as a decoder, this model effectively captures the characteristics of the first visual layer and also simulates the backward path from V1 to the LGN. The image reconstruction from the edge-detected outputs based on terminal convolutional kernels ensures an accurate representation of the LGN region. Overall, the innovative design of pAE demonstrates a biologically plausible model for visual processing, enhancing our understanding of the LGN-V1 interactions through sophisticated simulation techniques.
  • Figure 5: The multi-resolution analysis to approximate the LGN-V1 connection using DWT. This model decomposes input images into different frequency components at various scales. The proposed model uses detailed wavelet coefficients to generate edge-detected images at V1 and consequently applies inverse DWT on these coefficients to reconstruct the images in the backward stream.
  • ...and 9 more figures