Table of Contents
Fetching ...

Dual-Stage Invariant Continual Learning under Extreme Visual Sparsity

Rangya Zhang, Jiaping Xiao, Lu Bai, Yuhang Zhang, Mir Feroskhan

Abstract

Continual learning seeks to maintain stable adaptation under non-stationary environments, yet this problem becomes particularly challenging in object detection, where most existing methods implicitly assume relatively balanced visual conditions. In extreme-sparsity regimes, such as those observed in space-based resident space object (RSO) detection scenarios, foreground signals are overwhelmingly dominated by background observations. Under such conditions, we analytically demonstrate that background-driven gradients destabilize the feature backbone during sequential domain shifts, causing progressive representation drift. This exposes a structural limitation of continual learning approaches relying solely on output-level distillation, as they fail to preserve intermediate representation stability. To address this, we propose a dual-stage invariant continual learning framework via joint distillation, enforcing structural and semantic consistency on both backbone representations and detection predictions, respectively, thereby suppressing error propagation at its source while maintaining adaptability. Furthermore, to regulate gradient statistics under severe imbalance, we introduce a sparsity-aware data conditioning strategy combining patch-based sampling and distribution-aware augmentation. Experiments on a high-resolution space-based RSO detection dataset show consistent improvement over established continual object detection methods, achieving an absolute gain of +4.0 mAP under sequential domain shifts.

Dual-Stage Invariant Continual Learning under Extreme Visual Sparsity

Abstract

Continual learning seeks to maintain stable adaptation under non-stationary environments, yet this problem becomes particularly challenging in object detection, where most existing methods implicitly assume relatively balanced visual conditions. In extreme-sparsity regimes, such as those observed in space-based resident space object (RSO) detection scenarios, foreground signals are overwhelmingly dominated by background observations. Under such conditions, we analytically demonstrate that background-driven gradients destabilize the feature backbone during sequential domain shifts, causing progressive representation drift. This exposes a structural limitation of continual learning approaches relying solely on output-level distillation, as they fail to preserve intermediate representation stability. To address this, we propose a dual-stage invariant continual learning framework via joint distillation, enforcing structural and semantic consistency on both backbone representations and detection predictions, respectively, thereby suppressing error propagation at its source while maintaining adaptability. Furthermore, to regulate gradient statistics under severe imbalance, we introduce a sparsity-aware data conditioning strategy combining patch-based sampling and distribution-aware augmentation. Experiments on a high-resolution space-based RSO detection dataset show consistent improvement over established continual object detection methods, achieving an absolute gain of +4.0 mAP under sequential domain shifts.

Paper Structure

This paper contains 37 sections, 25 equations, 6 figures, 5 tables, 2 algorithms.

Figures (6)

  • Figure 1: Overall framework of our method. The top left panel illustrates the space-based observation configuration. The top right shows our sparsity-aware data conditioning pipeline. The bottom panel depicts the dual-stage invariance in COD.
  • Figure 2: Overview of dataset characteristics. (a) Raw $4418\times4418$ 16-bit grayscale image with sparse targets barely visible. (b) Reference image after contrast enhancement for visualization. (c) Zoomed regions highlighting LEO (blue), MEO (green), and GEO (red) RSOs. (d) Per-class target count (top) and target size distribution (bottom) for the shown example.
  • Figure 3: Temporal target distributions across all four camera datasets. Each subplot shows the per-frame target count over time, with LEO/MEO/GEO class breakdown. Background regions labeled D 1 to D 4 indicate domain divisions.
  • Figure 4: Spatial distributions of target centers across three consecutive subsets of 500 images each (Domain 1 to Domain 3). Each heatmap reflects the local density of targets per grid cell, clipped at a maximum count of 5.
  • Figure 5: Pairwise t-SNE visualizations with density contours overlaid for each domain. Contour maps highlight structural differences in feature distributions across domains.
  • ...and 1 more figures