Table of Contents
Fetching ...

Neural Encoding for Image Recall: Human-Like Memory

Virgile Foussereau, Robin Dumas

TL;DR

The work addresses human-like image memory in AI by encoding images into high-level latent embeddings rather than raw pixels and introducing encoding noise to mimic human memory variability. It compares CLIP and AlexNet as encoders and uses Gaussian noise or blur as perturbations, storing embeddings in a k-d tree memory and evaluating recall with forced-choice and repeat-detection tasks on natural and texture datasets. The strongest result comes from CLIP with Gaussian noise ($\sigma_n=20$), achieving about $97$–$98\%$ accuracy for natural images and around $52\%$ for textures, closely mirroring human performance patterns. PCA of memory embeddings links high-level content focus to memory success, informing memory-augmented AI designs and the critical role of encoder architecture.

Abstract

Achieving human-like memory recall in artificial systems remains a challenging frontier in computer vision. Humans demonstrate remarkable ability to recall images after a single exposure, even after being shown thousands of images. However, this capacity diminishes significantly when confronted with non-natural stimuli such as random textures. In this paper, we present a method inspired by human memory processes to bridge this gap between artificial and biological memory systems. Our approach focuses on encoding images to mimic the high-level information retained by the human brain, rather than storing raw pixel data. By adding noise to images before encoding, we introduce variability akin to the non-deterministic nature of human memory encoding. Leveraging pre-trained models' embedding layers, we explore how different architectures encode images and their impact on memory recall. Our method achieves impressive results, with 97% accuracy on natural images and near-random performance (52%) on textures. We provide insights into the encoding process and its implications for machine learning memory systems, shedding light on the parallels between human and artificial intelligence memory mechanisms.

Neural Encoding for Image Recall: Human-Like Memory

TL;DR

The work addresses human-like image memory in AI by encoding images into high-level latent embeddings rather than raw pixels and introducing encoding noise to mimic human memory variability. It compares CLIP and AlexNet as encoders and uses Gaussian noise or blur as perturbations, storing embeddings in a k-d tree memory and evaluating recall with forced-choice and repeat-detection tasks on natural and texture datasets. The strongest result comes from CLIP with Gaussian noise (), achieving about accuracy for natural images and around for textures, closely mirroring human performance patterns. PCA of memory embeddings links high-level content focus to memory success, informing memory-augmented AI designs and the critical role of encoder architecture.

Abstract

Achieving human-like memory recall in artificial systems remains a challenging frontier in computer vision. Humans demonstrate remarkable ability to recall images after a single exposure, even after being shown thousands of images. However, this capacity diminishes significantly when confronted with non-natural stimuli such as random textures. In this paper, we present a method inspired by human memory processes to bridge this gap between artificial and biological memory systems. Our approach focuses on encoding images to mimic the high-level information retained by the human brain, rather than storing raw pixel data. By adding noise to images before encoding, we introduce variability akin to the non-deterministic nature of human memory encoding. Leveraging pre-trained models' embedding layers, we explore how different architectures encode images and their impact on memory recall. Our method achieves impressive results, with 97% accuracy on natural images and near-random performance (52%) on textures. We provide insights into the encoding process and its implications for machine learning memory systems, shedding light on the parallels between human and artificial intelligence memory mechanisms.
Paper Structure (10 sections, 7 figures, 2 tables)

This paper contains 10 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Memory process illustration: perturbation and projection in a latent space. For illustration, the latent space is represented with two dimensions but it is typically much larger.
  • Figure 2: Example images from ImageNet used in this project.
  • Figure 3: Example images from KTH-TIPS2 used in this project.
  • Figure 4: Forced-choice Test results with a total of 10,000 images.
  • Figure 5: PCA projection of the memory encodings on a plane for AlexNet (left) and CLIP (right) encodings.
  • ...and 2 more figures