Table of Contents
Fetching ...

Modelling the Human Intuition to Complete the Missing Information in Images for Convolutional Neural Networks

Robin Koç, Fatoş T. Yarman Vural

TL;DR

This work tackles the fragility of CNNs to missing information by introducing a Gestalt-inspired two-layer intuition framework. A memory layer stores eigen-image templates derived from convolution outputs, along with stock feature maps, while an intuition layer at test time identifies a dominant class via Pearson correlations and replaces incomplete test feature maps with dominant-class templates when a posterior threshold $T$ is exceeded. The authors formalize the memory templates through covariance $S_{jk}$ and eigen-decomposition $S_{jk}=oldsymbol\\Psi_{jk}\boldsymbol\lambda_{jk}\boldsymbol\Psi_{jk}^{T}$ to obtain top $\delta$ eigen-images $\psi_{jk}$, and they use stock maps $\mathbf{f}_{Jk}$ for replacement. Experiments on MNIST demonstrate improved robustness to information loss, with notable gains when segments are deleted ($s ightarrow s_{max}$) and the intuition layer activated (best around $T=0.9$), indicating the potential of cognitive-inspired mechanisms to enhance image completion in CNNs. The approach provides a principled bridge between human visual intuition and machine learning, offering a framework for resilient perception under occlusion or corruption.

Abstract

In this study, we attempt to model intuition and incorporate this formalism to improve the performance of the Convolutional Neural Networks. Despite decades of research, ambiguities persist on principles of intuition. Experimental psychology reveals many types of intuition, which depend on state of the human mind. We focus on visual intuition, useful for completing missing information during visual cognitive tasks. First, we set up a scenario to gradually decrease the amount of visual information in the images of a dataset to examine its impact on CNN accuracy. Then, we represent a model for visual intuition using Gestalt theory. The theory claims that humans derive a set of templates according to their subconscious experiences. When the brain decides that there is missing information in a scene, such as occlusion, it instantaneously completes the information by replacing the missing parts with the most similar ones. Based upon Gestalt theory, we model the visual intuition, in two layers. Details of these layers are provided throughout the paper. We use the MNIST data set to test the suggested intuition model for completing the missing information. Experiments show that the augmented CNN architecture provides higher performances compared to the classic models when using incomplete images.

Modelling the Human Intuition to Complete the Missing Information in Images for Convolutional Neural Networks

TL;DR

This work tackles the fragility of CNNs to missing information by introducing a Gestalt-inspired two-layer intuition framework. A memory layer stores eigen-image templates derived from convolution outputs, along with stock feature maps, while an intuition layer at test time identifies a dominant class via Pearson correlations and replaces incomplete test feature maps with dominant-class templates when a posterior threshold is exceeded. The authors formalize the memory templates through covariance and eigen-decomposition to obtain top eigen-images , and they use stock maps for replacement. Experiments on MNIST demonstrate improved robustness to information loss, with notable gains when segments are deleted () and the intuition layer activated (best around ), indicating the potential of cognitive-inspired mechanisms to enhance image completion in CNNs. The approach provides a principled bridge between human visual intuition and machine learning, offering a framework for resilient perception under occlusion or corruption.

Abstract

In this study, we attempt to model intuition and incorporate this formalism to improve the performance of the Convolutional Neural Networks. Despite decades of research, ambiguities persist on principles of intuition. Experimental psychology reveals many types of intuition, which depend on state of the human mind. We focus on visual intuition, useful for completing missing information during visual cognitive tasks. First, we set up a scenario to gradually decrease the amount of visual information in the images of a dataset to examine its impact on CNN accuracy. Then, we represent a model for visual intuition using Gestalt theory. The theory claims that humans derive a set of templates according to their subconscious experiences. When the brain decides that there is missing information in a scene, such as occlusion, it instantaneously completes the information by replacing the missing parts with the most similar ones. Based upon Gestalt theory, we model the visual intuition, in two layers. Details of these layers are provided throughout the paper. We use the MNIST data set to test the suggested intuition model for completing the missing information. Experiments show that the augmented CNN architecture provides higher performances compared to the classic models when using incomplete images.
Paper Structure (13 sections, 10 equations, 5 figures, 1 table)

This paper contains 13 sections, 10 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: From left to right columns, sample images of digits 2, 3, 4, 5, 6, and 8 with $s=0, 4, 8$ and 16 deleted segments. The first column represents the complete images with $s=0$, used for training the CNN. The images in the rest of the columns, generated with $s\ge 4$, are the samples of incomplete test sets with an increased information deficiency.
  • Figure 2: A simple CNN Model, augmented with subconscious memory and intuition layers: Orange color represents the test steps, augmented by the intuition layer and blue color represents the training steps, augmented with intuition layer. Dashed lines are activated only when the posterior probabilities of the fully connected layer, obtained in the test step, are below a predefined threshold value.
  • Figure 3: Eigen-images of number digits obtained at the output of the first convolutional filter. The pairs of rows show the eigen images corresponding to the first, second, and third largest eigenvalues. These eigen images are assumed to represent the object templates in subconscious memory.
  • Figure 4: The blue plot shows the performance of the standard CNN model. Red plots show the performance of the augmented CNN model for different threshold values. When the estimated class posterior is less than a threshold $T$, the intuition layer is activated. As the shade of red gets lighter, the threshold $T$ for estimated class posterior increases.
  • Figure 5: Incomplete digit images of 4 and 0, derived from the MNIST dataset with 8 segments removed. The standard CNN model misclassifies digit 0 as 2 but correctly identifies digit 4. Conversely, the augmented CNN accurately labels 0 but classifies digit 4 as 9.