Table of Contents
Fetching ...

GPS: Distilling Compact Memories via Grid-based Patch Sampling for Efficient Online Class-Incremental Learning

Mingchuan Ma, Yuhao Zhou, Jindi Lv, Yuxin Tian, Dan Si, Shujian Li, Qing Ye, Jiancheng Lv

TL;DR

The paper tackles online class-incremental learning under strict memory and access constraints by addressing the bottleneck of informative memory distillation. It introduces Grid-based Patch Sampling (GPS), a model-free, lightweight method that converts high-resolution images into compact, structure-preserving surrogates by sampling one pixel per grid patch, enabling many more memory exemplars under the same budget. GPS supports two replay pathways—concatenation-based reconstruction for training and NCM-based upsampling for inference—without requiring bi-level optimization or a converged backbone. Empirical results across CIFAR-100, Mini-ImageNet, and Tiny-ImageNet show 3–4% gains in average end accuracy over strong replay baselines, with minimal computational overhead, demonstrating GPS as a plug-and-play, scalable solution for memory-efficient online continual learning.

Abstract

Online class-incremental learning aims to enable models to continuously adapt to new classes with limited access to past data, while mitigating catastrophic forgetting. Replay-based methods address this by maintaining a small memory buffer of previous samples, achieving competitive performance. For effective replay under constrained storage, recent approaches leverage distilled data to enhance the informativeness of memory. However, such approaches often involve significant computational overhead due to the use of bi-level optimization. Motivated by these limitations, we introduce Grid-based Patch Sampling (GPS), a lightweight and effective strategy for distilling informative memory samples without relying on a trainable model. GPS generates informative samples by sampling a subset of pixels from the original image, yielding compact low-resolution representations that preserve both semantic content and structural information. During replay, these representations are reassembled to support training and evaluation. Experiments on extensive benchmarks demonstrate that GRS can be seamlessly integrated into existing replay frameworks, leading to 3%-4% improvements in average end accuracy under memory-constrained settings, with limited computational overhead.

GPS: Distilling Compact Memories via Grid-based Patch Sampling for Efficient Online Class-Incremental Learning

TL;DR

The paper tackles online class-incremental learning under strict memory and access constraints by addressing the bottleneck of informative memory distillation. It introduces Grid-based Patch Sampling (GPS), a model-free, lightweight method that converts high-resolution images into compact, structure-preserving surrogates by sampling one pixel per grid patch, enabling many more memory exemplars under the same budget. GPS supports two replay pathways—concatenation-based reconstruction for training and NCM-based upsampling for inference—without requiring bi-level optimization or a converged backbone. Empirical results across CIFAR-100, Mini-ImageNet, and Tiny-ImageNet show 3–4% gains in average end accuracy over strong replay baselines, with minimal computational overhead, demonstrating GPS as a plug-and-play, scalable solution for memory-efficient online continual learning.

Abstract

Online class-incremental learning aims to enable models to continuously adapt to new classes with limited access to past data, while mitigating catastrophic forgetting. Replay-based methods address this by maintaining a small memory buffer of previous samples, achieving competitive performance. For effective replay under constrained storage, recent approaches leverage distilled data to enhance the informativeness of memory. However, such approaches often involve significant computational overhead due to the use of bi-level optimization. Motivated by these limitations, we introduce Grid-based Patch Sampling (GPS), a lightweight and effective strategy for distilling informative memory samples without relying on a trainable model. GPS generates informative samples by sampling a subset of pixels from the original image, yielding compact low-resolution representations that preserve both semantic content and structural information. During replay, these representations are reassembled to support training and evaluation. Experiments on extensive benchmarks demonstrate that GRS can be seamlessly integrated into existing replay frameworks, leading to 3%-4% improvements in average end accuracy under memory-constrained settings, with limited computational overhead.

Paper Structure

This paper contains 16 sections, 10 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Efficiency vs. performance trade-off on Mini-ImageNet with a buffer size of 100. The size of each bubble reflects GPU memory usage during training. Compared to existing baselines, our method GPS achieves the highest accuracy while maintaining comparable training time.
  • Figure 2: Illustration of RDED's limitation in the online continual learning setting. When the model is unconverged, it may select misleading or suboptimal patches due to unreliable feature representations.
  • Figure 3: Overview of the proposed Grid-based Patch Sampling (GPS) framework.Given a data stream with sequential tasks, each incoming sample is processed by the GPS module: (1) the image is partitioned into a uniform grid, (2) one representative pixel is randomly sampled from each patch, and (3) sampled pixels are grouped to form a low-resolution surrogate. During replay, two strategies are used: (a) concatenation of GPS samples from the same class to reconstruct high-resolution training images; and (b) upsampling each GPS sample for inference using a Nearest Class Mean (NCM) classifier.For visualization purposes, a region is used to represent a single pixel in this image.
  • Figure 4: Comparison of training time (top) and GPU memory usage (down) on three datasets. We use a buffer size configured to store approximately one exemplar per class as a case study.
  • Figure 5: Average end accuracy comparison between SCR and GPS under increasing input resolutions. Experiments are conducted under a constrained buffer budget with one exemplar per class. GPS achieves larger gains at higher resolutions, demonstrating better scalability.
  • ...and 3 more figures