Modelling the Human Intuition to Complete the Missing Information in Images for Convolutional Neural Networks
Robin Koç, Fatoş T. Yarman Vural
TL;DR
This work tackles the fragility of CNNs to missing information by introducing a Gestalt-inspired two-layer intuition framework. A memory layer stores eigen-image templates derived from convolution outputs, along with stock feature maps, while an intuition layer at test time identifies a dominant class via Pearson correlations and replaces incomplete test feature maps with dominant-class templates when a posterior threshold $T$ is exceeded. The authors formalize the memory templates through covariance $S_{jk}$ and eigen-decomposition $S_{jk}=oldsymbol\\Psi_{jk}\boldsymbol\lambda_{jk}\boldsymbol\Psi_{jk}^{T}$ to obtain top $\delta$ eigen-images $\psi_{jk}$, and they use stock maps $\mathbf{f}_{Jk}$ for replacement. Experiments on MNIST demonstrate improved robustness to information loss, with notable gains when segments are deleted ($s ightarrow s_{max}$) and the intuition layer activated (best around $T=0.9$), indicating the potential of cognitive-inspired mechanisms to enhance image completion in CNNs. The approach provides a principled bridge between human visual intuition and machine learning, offering a framework for resilient perception under occlusion or corruption.
Abstract
In this study, we attempt to model intuition and incorporate this formalism to improve the performance of the Convolutional Neural Networks. Despite decades of research, ambiguities persist on principles of intuition. Experimental psychology reveals many types of intuition, which depend on state of the human mind. We focus on visual intuition, useful for completing missing information during visual cognitive tasks. First, we set up a scenario to gradually decrease the amount of visual information in the images of a dataset to examine its impact on CNN accuracy. Then, we represent a model for visual intuition using Gestalt theory. The theory claims that humans derive a set of templates according to their subconscious experiences. When the brain decides that there is missing information in a scene, such as occlusion, it instantaneously completes the information by replacing the missing parts with the most similar ones. Based upon Gestalt theory, we model the visual intuition, in two layers. Details of these layers are provided throughout the paper. We use the MNIST data set to test the suggested intuition model for completing the missing information. Experiments show that the augmented CNN architecture provides higher performances compared to the classic models when using incomplete images.
