Table of Contents
Fetching ...

Memory-Modular Classification: Learning to Generalize with Memory Replacement

Dahyun Kang, Ahmet Iscen, Eunchan Jo, Sua Choi, Minsu Cho, Cordelia Schmid

TL;DR

The paper introduces the memory-modular learner (MML), which decouples world knowledge from task-specific reasoning by leveraging an external, replaceable memory populated with web-crawled images and texts. By computing class prototypes from cross-modal consensus and using memory-based kNN retrieval with cross-attention, MML performs adaptive reasoning for zero-/few-shot and class-incremental classification without retraining the backbone, even under noisy memory conditions. Key contributions include memory construction and prototype generation, memory-aware reasoning with attention, and a comprehensive experimental study showing strong zero-shot transfer, robustness to data noise, and scalability to real-world tasks with modest compute. This approach provides a practical pathway to integrate up-to-date world knowledge into vision systems while preserving model stability and adaptability.

Abstract

We propose a novel memory-modular learner for image classification that separates knowledge memorization from reasoning. Our model enables effective generalization to new classes by simply replacing the memory contents, without the need for model retraining. Unlike traditional models that encode both world knowledge and task-specific skills into their weights during training, our model stores knowledge in the external memory of web-crawled image and text data. At inference time, the model dynamically selects relevant content from the memory based on the input image, allowing it to adapt to arbitrary classes by simply replacing the memory contents. The key differentiator that our learner meta-learns to perform classification tasks with noisy web data from unseen classes, resulting in robust performance across various classification scenarios. Experimental results demonstrate the promising performance and versatility of our approach in handling diverse classification tasks, including zero-shot/few-shot classification of unseen classes, fine-grained classification, and class-incremental classification.

Memory-Modular Classification: Learning to Generalize with Memory Replacement

TL;DR

The paper introduces the memory-modular learner (MML), which decouples world knowledge from task-specific reasoning by leveraging an external, replaceable memory populated with web-crawled images and texts. By computing class prototypes from cross-modal consensus and using memory-based kNN retrieval with cross-attention, MML performs adaptive reasoning for zero-/few-shot and class-incremental classification without retraining the backbone, even under noisy memory conditions. Key contributions include memory construction and prototype generation, memory-aware reasoning with attention, and a comprehensive experimental study showing strong zero-shot transfer, robustness to data noise, and scalability to real-world tasks with modest compute. This approach provides a practical pathway to integrate up-to-date world knowledge into vision systems while preserving model stability and adaptability.

Abstract

We propose a novel memory-modular learner for image classification that separates knowledge memorization from reasoning. Our model enables effective generalization to new classes by simply replacing the memory contents, without the need for model retraining. Unlike traditional models that encode both world knowledge and task-specific skills into their weights during training, our model stores knowledge in the external memory of web-crawled image and text data. At inference time, the model dynamically selects relevant content from the memory based on the input image, allowing it to adapt to arbitrary classes by simply replacing the memory contents. The key differentiator that our learner meta-learns to perform classification tasks with noisy web data from unseen classes, resulting in robust performance across various classification scenarios. Experimental results demonstrate the promising performance and versatility of our approach in handling diverse classification tasks, including zero-shot/few-shot classification of unseen classes, fine-grained classification, and class-incremental classification.

Paper Structure

This paper contains 21 sections, 5 equations, 10 figures, 21 tables.

Figures (10)

  • Figure 1: Training and evaluation stages of MML for web-assisted zero-shot classification. MML constructs image/text memory with text keyword search on the internet given target classes. The memory provides relevant image/text features which are integrated via a trainable knowledge integration module (a). On evaluation, the memory can be replaced or detached from the model such that MML joins the new knowledge as memory, while the rest of the model remains unchanged. Once trained, MML handles zero-shot classification on unseen classes with memory replacement (b) and incremental classes with memory expansion (c) using the new knowledge collected from web to solve zero-shot classification.
  • Figure 2: Memory-modular learner (MML) constructs image/text memory by web-crawling with text keyword search. Given a query image, its $k$NN features are retrieved from each memory and used for attentive knowledge integration. The class prototypes are constructed with the average of the memory elements of the highest cross-modal similarity. MML derives class reasoning with the nearest neighbors (NNs) from the external memory. This modular memory enables MML to perform web-assisted zero-/few-shot classification on unseen classes by memory replacement and class-incremental classification by memory expansion.
  • Figure 3: Effect of memory size on ImageNet-S
  • Figure 4: Examples of a query, image 2NNs and text 2NNs. Human faces are anonymized for visualization.
  • Figure 5: Effect of different memory types
  • ...and 5 more figures