Table of Contents
Fetching ...

Generative Active Learning for Long-tailed Instance Segmentation

Muzhi Zhu, Chengxiang Fan, Hao Chen, Yang Liu, Weian Mao, Xiaogang Xu, Chunhua Shen

TL;DR

This work tackles using unlimited, noisy generated data to improve long-tailed instance segmentation. It introduces BSGAL, a batched streaming generative active learning method that online-estimates each batch's contribution via a gradient-based signal and a momentum gradient cache, enabling effective filtering and utilization of generated data. Empirical results on CIFAR-10 (offline) and LVIS (online) show that selective use of generated data yields meaningful gains over unfiltered and CLIP-filtered baselines, with pronounced improvements for rare categories. The approach advances practical deployment by providing a scalable, data-diversity-preserving framework that bridges generative data with complex perception tasks.

Abstract

Recently, large-scale language-image generative models have gained widespread attention and many works have utilized generated data from these models to further enhance the performance of perception tasks. However, not all generated data can positively impact downstream models, and these methods do not thoroughly explore how to better select and utilize generated data. On the other hand, there is still a lack of research oriented towards active learning on generated data. In this paper, we explore how to perform active learning specifically for generated data in the long-tailed instance segmentation task. Subsequently, we propose BSGAL, a new algorithm that online estimates the contribution of the generated data based on gradient cache. BSGAL can handle unlimited generated data and complex downstream segmentation tasks effectively. Experiments show that BSGAL outperforms the baseline approach and effectually improves the performance of long-tailed segmentation. Our code can be found at https://github.com/aim-uofa/DiverGen.

Generative Active Learning for Long-tailed Instance Segmentation

TL;DR

This work tackles using unlimited, noisy generated data to improve long-tailed instance segmentation. It introduces BSGAL, a batched streaming generative active learning method that online-estimates each batch's contribution via a gradient-based signal and a momentum gradient cache, enabling effective filtering and utilization of generated data. Empirical results on CIFAR-10 (offline) and LVIS (online) show that selective use of generated data yields meaningful gains over unfiltered and CLIP-filtered baselines, with pronounced improvements for rare categories. The approach advances practical deployment by providing a scalable, data-diversity-preserving framework that bridges generative data with complex perception tasks.

Abstract

Recently, large-scale language-image generative models have gained widespread attention and many works have utilized generated data from these models to further enhance the performance of perception tasks. However, not all generated data can positively impact downstream models, and these methods do not thoroughly explore how to better select and utilize generated data. On the other hand, there is still a lack of research oriented towards active learning on generated data. In this paper, we explore how to perform active learning specifically for generated data in the long-tailed instance segmentation task. Subsequently, we propose BSGAL, a new algorithm that online estimates the contribution of the generated data based on gradient cache. BSGAL can handle unlimited generated data and complex downstream segmentation tasks effectively. Experiments show that BSGAL outperforms the baseline approach and effectually improves the performance of long-tailed segmentation. Our code can be found at https://github.com/aim-uofa/DiverGen.
Paper Structure (32 sections, 1 theorem, 8 equations, 9 figures, 12 tables, 3 algorithms)

This paper contains 32 sections, 1 theorem, 8 equations, 9 figures, 12 tables, 3 algorithms.

Key Result

Lemma 4.1

The loss of a network $f$ on a dataset $\mathcal{U}$ can be approximated by a first-order approximation:

Figures (9)

  • Figure 1: Comparison between Traditional Active Learning and Generative Active Learning frameworks. (a) Traditional Active Learning relies on a human oracle, therefore the annotation is accurate but with a limited budget, so the model is required to select the most informative unlabeled data. (b) Generative Active Learning, which relies on a generative oracle, has an unlimited labeled pool. However, the quality of annotation varies greatly, so the model must judiciously accept data.
  • Figure 2: The distribution of contributions under different noise scales.
  • Figure 3: The best and worst samples found using our contribution estimation function for a LVIS class 'bun'.
  • Figure 4: Performance of the model under different iterations.
  • Figure 5: Model performances when using different amount of generated data.
  • ...and 4 more figures

Theorems & Definitions (6)

  • Lemma 4.1
  • Remark 4.2
  • Remark 4.3
  • Definition 4.4
  • Remark 4.5
  • Remark 4.6