Table of Contents
Fetching ...

Cut out and Replay: A Simple yet Versatile Strategy for Multi-Label Online Continual Learning

Xinrui Wang, Shao-yuan Li, Jiaqiang Zhang, Songcan Chen

TL;DR

MOCL contends with pervasive missing labels and long-tailed class distributions in streaming multi-label data. The authors propose CUTER, a simple plug-in strategy that identifies, strengthens, and replays label-specific regions to support fine-grained experience replay. It comprises three components: zero-shot localization assessment of pre-trained models using the average Fiedler value $\lambda_2$ of patch graphs, selective replay via label-region matching with MCut to extract object crops for memory, and a localization-aware regularization using the nuclear norm $\|A\|_*$ to stabilize patch graphs. Experiments on VOC 2007, MS-COCO, and NUS-WIDE show state-of-the-art MOCL performance and confirm the method's plug-in compatibility with existing approaches. The work provides a theoretical grounding using graph spectral theory and highlights trade-offs with computational overhead and backbone choices.

Abstract

Multi-Label Online Continual Learning (MOCL) requires models to learn continuously from endless multi-label data streams, facing complex challenges including persistent catastrophic forgetting, potential missing labels, and uncontrollable imbalanced class distributions. While existing MOCL methods attempt to address these challenges through various techniques, \textit{they all overlook label-specific region identifying and feature learning} - a fundamental solution rooted in multi-label learning but challenging to achieve in the online setting with incremental and partial supervision. To this end, we first leverage the inherent structural information of input data to evaluate and verify the innate localization capability of different pre-trained models. Then, we propose CUTER (CUT-out-and-Experience-Replay), a simple yet versatile strategy that provides fine-grained supervision signals by further identifying, strengthening and cutting out label-specific regions for efficient experience replay. It not only enables models to simultaneously address catastrophic forgetting, missing labels, and class imbalance challenges, but also serves as an orthogonal solution that seamlessly integrates with existing approaches. Extensive experiments on multiple multi-label image benchmarks demonstrate the superiority of our proposed method. The code is available at \href{https://github.com/wxr99/Cut-Replay}{https://github.com/wxr99/Cut-Replay}

Cut out and Replay: A Simple yet Versatile Strategy for Multi-Label Online Continual Learning

TL;DR

MOCL contends with pervasive missing labels and long-tailed class distributions in streaming multi-label data. The authors propose CUTER, a simple plug-in strategy that identifies, strengthens, and replays label-specific regions to support fine-grained experience replay. It comprises three components: zero-shot localization assessment of pre-trained models using the average Fiedler value of patch graphs, selective replay via label-region matching with MCut to extract object crops for memory, and a localization-aware regularization using the nuclear norm to stabilize patch graphs. Experiments on VOC 2007, MS-COCO, and NUS-WIDE show state-of-the-art MOCL performance and confirm the method's plug-in compatibility with existing approaches. The work provides a theoretical grounding using graph spectral theory and highlights trade-offs with computational overhead and backbone choices.

Abstract

Multi-Label Online Continual Learning (MOCL) requires models to learn continuously from endless multi-label data streams, facing complex challenges including persistent catastrophic forgetting, potential missing labels, and uncontrollable imbalanced class distributions. While existing MOCL methods attempt to address these challenges through various techniques, \textit{they all overlook label-specific region identifying and feature learning} - a fundamental solution rooted in multi-label learning but challenging to achieve in the online setting with incremental and partial supervision. To this end, we first leverage the inherent structural information of input data to evaluate and verify the innate localization capability of different pre-trained models. Then, we propose CUTER (CUT-out-and-Experience-Replay), a simple yet versatile strategy that provides fine-grained supervision signals by further identifying, strengthening and cutting out label-specific regions for efficient experience replay. It not only enables models to simultaneously address catastrophic forgetting, missing labels, and class imbalance challenges, but also serves as an orthogonal solution that seamlessly integrates with existing approaches. Extensive experiments on multiple multi-label image benchmarks demonstrate the superiority of our proposed method. The code is available at \href{https://github.com/wxr99/Cut-Replay}{https://github.com/wxr99/Cut-Replay}

Paper Structure

This paper contains 26 sections, 3 theorems, 27 equations, 8 figures, 7 tables, 1 algorithm.

Key Result

Lemma 2.2

chung1997spectral For a weighted undirected graph $G$, let $\lambda_2$ be its Fiedler value, $h(G)$ be its Cheeger constant, and $\Delta = \max_{i} d(i)$ be the maximum degree in the graph. Then we have:

Figures (8)

  • Figure 1: Two unique challenges in MOCL compared with traditional OCL: (1) Massive missing past and future labels in both coming data stream and memory buffer. (2) Severe class imbalance that persists in the memory buffer even with re-balancing strategies like CEBSwei2019doesyan2021framework.
  • Figure 2: Correlation between the averaged Fiedler Value and zero-shot detection performance ($AP_{50}$) on Pascal VOC07 and MSCOCO.
  • Figure 3: Visual comparison of detection (coarse bounding boxes) and segmentation (coarse masks) capabilities across pre-trained models using ViT-S/16 backbone, obtained via two-round MaskCut wang2023cut.
  • Figure 4: Class distribution in the memory buffer (size=1000) for different re-balancing methods after training on VOC dataset.
  • Figure 5: Visualization of model's zero-shot localization capability on PASCAL VOC dataset during MOCL training.
  • ...and 3 more figures

Theorems & Definitions (4)

  • Definition 2.1
  • Lemma 2.2
  • Theorem 2.3
  • Lemma 3.2