Table of Contents
Fetching ...

Looking Beyond the Known: Towards a Data Discovery Guided Open-World Object Detection

Anay Majee, Amitesh Gangrade, Rishabh Iyer

TL;DR

This work tackles Open-World Object Detection (OWOD), where detectors must discover unknown objects and incrementally learn new classes without retraining on all data. It introduces CROWD, a data-discovery guided framework that interleaves CROWD-Discover (SCG-based unknown mining) and CROWD-Learn (a combinatorial, submodular objective-driven learning) to separate known and unknown representations while preserving prior knowledge. By employing Submodular Conditional Gain and related submodular information functions (e.g., Graph-Cut, Facility-Location, and Log-Determinant), CROWD achieves substantial gains in unknown recall and known-class accuracy on OWOD benchmarks (M-OWOD and S-OWOD) and improves generalization to Incremental Object Detection (IOD). The approach demonstrates the value of a set-based, combinatorial perspective for open-world learning, with practical impact on scalable, continual detection systems, and suggests directions for further refinement of submodular objectives and constraints.

Abstract

Open-World Object Detection (OWOD) enriches traditional object detectors by enabling continual discovery and integration of unknown objects via human guidance. However, existing OWOD approaches frequently suffer from semantic confusion between known and unknown classes, alongside catastrophic forgetting, leading to diminished unknown recall and degraded known-class accuracy. To overcome these challenges, we propose Combinatorial Open-World Detection (CROWD), a unified framework reformulating unknown object discovery and adaptation as an interwoven combinatorial (set-based) data-discovery (CROWD-Discover) and representation learning (CROWD-Learn) task. CROWD-Discover strategically mines unknown instances by maximizing Submodular Conditional Gain (SCG) functions, selecting representative examples distinctly dissimilar from known objects. Subsequently, CROWD-Learn employs novel combinatorial objectives that jointly disentangle known and unknown representations while maintaining discriminative coherence among known classes, thus mitigating confusion and forgetting. Extensive evaluations on OWOD benchmarks illustrate that CROWD achieves improvements of 2.83% and 2.05% in known-class accuracy on M-OWODB and S-OWODB, respectively, and nearly 2.4x unknown recall compared to leading baselines.

Looking Beyond the Known: Towards a Data Discovery Guided Open-World Object Detection

TL;DR

This work tackles Open-World Object Detection (OWOD), where detectors must discover unknown objects and incrementally learn new classes without retraining on all data. It introduces CROWD, a data-discovery guided framework that interleaves CROWD-Discover (SCG-based unknown mining) and CROWD-Learn (a combinatorial, submodular objective-driven learning) to separate known and unknown representations while preserving prior knowledge. By employing Submodular Conditional Gain and related submodular information functions (e.g., Graph-Cut, Facility-Location, and Log-Determinant), CROWD achieves substantial gains in unknown recall and known-class accuracy on OWOD benchmarks (M-OWOD and S-OWOD) and improves generalization to Incremental Object Detection (IOD). The approach demonstrates the value of a set-based, combinatorial perspective for open-world learning, with practical impact on scalable, continual detection systems, and suggests directions for further refinement of submodular objectives and constraints.

Abstract

Open-World Object Detection (OWOD) enriches traditional object detectors by enabling continual discovery and integration of unknown objects via human guidance. However, existing OWOD approaches frequently suffer from semantic confusion between known and unknown classes, alongside catastrophic forgetting, leading to diminished unknown recall and degraded known-class accuracy. To overcome these challenges, we propose Combinatorial Open-World Detection (CROWD), a unified framework reformulating unknown object discovery and adaptation as an interwoven combinatorial (set-based) data-discovery (CROWD-Discover) and representation learning (CROWD-Learn) task. CROWD-Discover strategically mines unknown instances by maximizing Submodular Conditional Gain (SCG) functions, selecting representative examples distinctly dissimilar from known objects. Subsequently, CROWD-Learn employs novel combinatorial objectives that jointly disentangle known and unknown representations while maintaining discriminative coherence among known classes, thus mitigating confusion and forgetting. Extensive evaluations on OWOD benchmarks illustrate that CROWD achieves improvements of 2.83% and 2.05% in known-class accuracy on M-OWODB and S-OWODB, respectively, and nearly 2.4x unknown recall compared to leading baselines.

Paper Structure

This paper contains 22 sections, 3 theorems, 19 equations, 7 figures, 10 tables, 2 algorithms.

Key Result

Theorem A.1

Given a set of known RoIs $K^t_i$, $i \in [1, C^t]$, a set of unknown RoIs $U^t$ ($\mathcal{T} = K^t \cup U^t$) and the Facility-Location based submodular function $f$ defined over any set $A$ s.t. $f(A) = \sum_{i \in \mathcal{T}} \max_{j \in A} s_{ij}$, we define CROWD-FL learning objective to lear

Figures (7)

  • Figure 2: Interleaved Data-Discovery and Representation Learning in CROWD on an incoming task $T_t$. CROWD takes as input the model weights from $T_{t-1}$ and a small replay buffer of previously known classes $\hat{K}^{t -1}$, applies (a) CROWD-Learn to discover unknown RoIs and (b) CROWD-L to learn discriminative features of both known and unknown instances to return an updated model $h^{t+1}$ and the current task replay buffer $\hat{K}^t$.
  • Figure 3: Illustration of the data-discovery pipeline in CROWD-D on a synthetic dataset with $|\mathtt{R}| = 500$ and budget $\mathtt{k} = 10$ and the underlying submodular function as Graph-Cut. CROWD-D selects $U^t$ which are both dissimilar to background $B^t$ and known $K^t$ instances.
  • Figure 4: Characterization of losses in CROWD-L on a synthetic two-cluster imbalanced dataset by increasing known vs. unknown class separation (cases 1 through 3) similar to the RoI embedding space of $h^t(.; \theta)$. The synthetic dataset generation is performed under the same seed.
  • Figure 5: Qualitative results from CROWD contrasted against OrthogonalDet sun2024exploring showing that our approach mitigates (a) confusion (b) generalizes to unknowns and (c) reduces forgetting.
  • Figure 6: CROWD-D results on synthetic dataset contrasted against instances of popular submodular functions - Graph-Cut, Facility-Location and Log-Determinant. Graph-Cut based selection strategy models both representation and diversity resulting in the best possible choice of unknown instances in $U^t$.
  • ...and 2 more figures

Theorems & Definitions (6)

  • Theorem A.1
  • proof
  • Theorem A.2
  • proof
  • Theorem A.3
  • proof