Table of Contents
Fetching ...

PDAC: Efficient Coreset Selection for Continual Learning via Probability Density Awareness

Junqi Gao, Zhichang Guo, Dazhi Zhang, Yao Li, Yi Ran, Biqing Qi

TL;DR

This work tackles efficient memory-buffer construction for rehearsal-based continual learning by showing that samples from high-density regions dominantly suppress the buffer-induced error. It introduces PDAC, which uses a Projected Gaussian Mixture to estimate joint sample density in a low-dimensional feature space and selects high-density samples to form the coreset, plus SPDAC which extends this approach to streaming data via a streaming EM update. The method yields superior accuracy and lower forgetting than strong baselines while keeping selection cost low, and SPDAC demonstrates robust performance in streaming CL with favorable runtime. The approach provides a theoretically grounded, density-driven alternative to bilevel optimization for coreset selection with practical impact for both offline and streaming continual learning scenarios.

Abstract

Rehearsal-based Continual Learning (CL) maintains a limited memory buffer to store replay samples for knowledge retention, making these approaches heavily reliant on the quality of the stored samples. Current Rehearsal-based CL methods typically construct the memory buffer by selecting a representative subset (referred to as coresets), aiming to approximate the training efficacy of the full dataset with minimal storage overhead. However, mainstream Coreset Selection (CS) methods generally formulate the CS problem as a bi-level optimization problem that relies on numerous inner and outer iterations to solve, leading to substantial computational cost thus limiting their practical efficiency. In this paper, we aim to provide a more efficient selection logic and scheme for coreset construction. To this end, we first analyze the Mean Squared Error (MSE) between the buffer-trained model and the Bayes-optimal model through the perspective of localized error decomposition to investigate the contribution of samples from different regions to MSE suppression. Further theoretical and experimental analyses demonstrate that samples with high probability density play a dominant role in error suppression. Inspired by this, we propose the Probability Density-Aware Coreset (PDAC) method. PDAC leverages the Projected Gaussian Mixture (PGM) model to estimate each sample's joint density, enabling efficient density-prioritized buffer selection. Finally, we introduce the streaming Expectation Maximization (EM) algorithm to enhance the adaptability of PGM parameters to streaming data, yielding Streaming PDAC (SPDAC) for streaming scenarios. Extensive comparative experiments show that our methods outperforms other baselines across various CL settings while ensuring favorable efficiency.

PDAC: Efficient Coreset Selection for Continual Learning via Probability Density Awareness

TL;DR

This work tackles efficient memory-buffer construction for rehearsal-based continual learning by showing that samples from high-density regions dominantly suppress the buffer-induced error. It introduces PDAC, which uses a Projected Gaussian Mixture to estimate joint sample density in a low-dimensional feature space and selects high-density samples to form the coreset, plus SPDAC which extends this approach to streaming data via a streaming EM update. The method yields superior accuracy and lower forgetting than strong baselines while keeping selection cost low, and SPDAC demonstrates robust performance in streaming CL with favorable runtime. The approach provides a theoretically grounded, density-driven alternative to bilevel optimization for coreset selection with practical impact for both offline and streaming continual learning scenarios.

Abstract

Rehearsal-based Continual Learning (CL) maintains a limited memory buffer to store replay samples for knowledge retention, making these approaches heavily reliant on the quality of the stored samples. Current Rehearsal-based CL methods typically construct the memory buffer by selecting a representative subset (referred to as coresets), aiming to approximate the training efficacy of the full dataset with minimal storage overhead. However, mainstream Coreset Selection (CS) methods generally formulate the CS problem as a bi-level optimization problem that relies on numerous inner and outer iterations to solve, leading to substantial computational cost thus limiting their practical efficiency. In this paper, we aim to provide a more efficient selection logic and scheme for coreset construction. To this end, we first analyze the Mean Squared Error (MSE) between the buffer-trained model and the Bayes-optimal model through the perspective of localized error decomposition to investigate the contribution of samples from different regions to MSE suppression. Further theoretical and experimental analyses demonstrate that samples with high probability density play a dominant role in error suppression. Inspired by this, we propose the Probability Density-Aware Coreset (PDAC) method. PDAC leverages the Projected Gaussian Mixture (PGM) model to estimate each sample's joint density, enabling efficient density-prioritized buffer selection. Finally, we introduce the streaming Expectation Maximization (EM) algorithm to enhance the adaptability of PGM parameters to streaming data, yielding Streaming PDAC (SPDAC) for streaming scenarios. Extensive comparative experiments show that our methods outperforms other baselines across various CL settings while ensuring favorable efficiency.

Paper Structure

This paper contains 27 sections, 5 theorems, 45 equations, 6 figures, 7 tables, 2 algorithms.

Key Result

Proposition 4.1

Under the partition $\{\mathcal{Z}_i\}_{i=1}^{L_m}$ of the sample space $\mathcal{Z}$, the error $\mathcal{R}_{\mathcal{M} \mid \mathcal{S}}$ has the following decomposition: where $\text{tr}(\cdot)$ denotes the trace operator, $\mathbb{E}_{\boldsymbol{z}\mid\mathcal{Z}_i}$ is the conditional expectation of $\boldsymbol{z}$ under the condition $\boldsymbol{z}\in\mathcal{Z}_i$, and $\text{Cov}_{\m

Figures (6)

  • Figure 1: Comparison of the final average accuracy ($\%$) and average selection time (in seconds) after CL training on Split-CIFAR10 (left) and Split-CIFAR100 (right) for the proposed PDAC against other representative CS methods. The y-coordinate of each bubble's center represents the method's final average accuracy, while its size is positively correlated with the its accuracy. The central marker indicates the method's performance point. PDAC achieves high performance while maintaining good efficiency.
  • Figure 2: (a)-(d): Boxplot of the conditional model variance across local regions with different resampling probabilities $\mathrm{p}_i$ (binned) under different memory buffer sizes $N$ ((a) corresponds to $N=10$, (b) to $N=100$, (c) to $N=1000$ and (d) to $N=10000$).
  • Figure 3: (a): Curves of $\mathcal{R}_{\mathcal{M} \mid \mathcal{S}}$ under different selection strategies across various values of $N$; (b): Average density of the regions containing the samples in $\mathcal{M}$ under different selection strategies.
  • Figure 4: Stepwise sample log-likelihood of the PGM component's EM iterations across different tasks. Panels (a) and (b) show the results on Split-CIFAR10 and Split-CIFAR100, respectively.
  • Figure 5: (a)-(b): ACC of PDAC on Split-CIFAR10 and Split-CIFAR100 with varying numbers of Gaussian components $L$, respectively; (c)-(d): ACC on Split-CIFAR10 and Split-CIFAR100 versus the projection dimensions $d$. Error bars represent the standard deviation across three independent runs.
  • ...and 1 more figures

Theorems & Definitions (5)

  • Proposition 4.1
  • Theorem 4.1
  • Corollary 4.1
  • Lemma 1
  • Lemma 2