TARO: Toward Semantically Rich Open-World Object Detection
Yuchen Zhang, Yao Lu, Johannes Betz
TL;DR
TARO tackles open-world object detection by moving beyond labeling unknowns as a single class and instead categorizing them into coarse, semantically meaningful parents within a taxonomy. It extends DETR-based detectors with three key components: a sparsemax-based objectness head that allocates a sparse competition among queries, a hierarchy-aware activation that couples parent-child predictions, and a hierarchy-guided relabeling strategy that provides auxiliary supervision for objectness using non-leaf activations. Empirical results on OWOD and OW-DETR splits show TARO achieves higher unknown recall, reduces confusion between known and unknown objects, and can categorize unknowns with up to 29.9% Hierarchy Accuracy on the OWOD Split, while maintaining competitive known-class mAP. The work also analyzes ablations and discusses future directions, including leveraging Vision-Language Models and multimodal data to further enhance semantic understanding of unknowns in open spaces.
Abstract
Modern object detectors are largely confined to a "closed-world" assumption, limiting them to a predefined set of classes and posing risks when encountering novel objects in real-world scenarios. While open-set detection methods aim to address this by identifying such instances as 'Unknown', this is often insufficient. Rather than treating all unknowns as a single class, assigning them more descriptive subcategories can enhance decision-making in safety-critical contexts. For example, identifying an object as an 'Unknown Animal' (requiring an urgent stop) versus 'Unknown Debris' (requiring a safe lane change) is far more useful than just 'Unknown' in autonomous driving. To bridge this gap, we introduce TARO, a novel detection framework that not only identifies unknown objects but also classifies them into coarse parent categories within a semantic hierarchy. TARO employs a unique architecture with a sparsemax-based head for modeling objectness, a hierarchy-guided relabeling component that provides auxiliary supervision, and a classification module that learns hierarchical relationships. Experiments show TARO can categorize up to 29.9% of unknowns into meaningful coarse classes, significantly reduce confusion between unknown and known classes, and achieve competitive performance in both unknown recall and known mAP. Code will be made available.
