Memory-Modular Classification: Learning to Generalize with Memory Replacement
Dahyun Kang, Ahmet Iscen, Eunchan Jo, Sua Choi, Minsu Cho, Cordelia Schmid
TL;DR
The paper introduces the memory-modular learner (MML), which decouples world knowledge from task-specific reasoning by leveraging an external, replaceable memory populated with web-crawled images and texts. By computing class prototypes from cross-modal consensus and using memory-based kNN retrieval with cross-attention, MML performs adaptive reasoning for zero-/few-shot and class-incremental classification without retraining the backbone, even under noisy memory conditions. Key contributions include memory construction and prototype generation, memory-aware reasoning with attention, and a comprehensive experimental study showing strong zero-shot transfer, robustness to data noise, and scalability to real-world tasks with modest compute. This approach provides a practical pathway to integrate up-to-date world knowledge into vision systems while preserving model stability and adaptability.
Abstract
We propose a novel memory-modular learner for image classification that separates knowledge memorization from reasoning. Our model enables effective generalization to new classes by simply replacing the memory contents, without the need for model retraining. Unlike traditional models that encode both world knowledge and task-specific skills into their weights during training, our model stores knowledge in the external memory of web-crawled image and text data. At inference time, the model dynamically selects relevant content from the memory based on the input image, allowing it to adapt to arbitrary classes by simply replacing the memory contents. The key differentiator that our learner meta-learns to perform classification tasks with noisy web data from unseen classes, resulting in robust performance across various classification scenarios. Experimental results demonstrate the promising performance and versatility of our approach in handling diverse classification tasks, including zero-shot/few-shot classification of unseen classes, fine-grained classification, and class-incremental classification.
