Table of Contents
Fetching ...

Hyp-OW: Exploiting Hierarchical Structure Learning with Hyperbolic Distance Enhances Open World Object Detection

Thang Doan, Xin Li, Sima Behpour, Wenbin He, Liang Gou, Liu Ren

TL;DR

The paper tackles Open World Object Detection by addressing the lack of explicit unknown labeling and the weak linkage between known and unknown classes. It introduces Hyp-OW, a method that learns hierarchical representations in hyperbolic space through a Hyperbolic Contrastive Loss, a SuperClass Regularizer, and an Adaptive Relabeling scheme to detect unknowns. Empirical results show consistent improvements in both unknown recall and known detection across hierarchical and semi-hierarchical benchmarks, with ablations highlighting the contribution of each component and the role of hyperbolic curvature. The approach offers a principled way to exploit semantic hierarchies for open-world perception, with potential applicability to related tasks like OOD detection and instance segmentation, and points toward leveraging vision–language priors to further enhance unknown-object retrieval.

Abstract

Open World Object Detection (OWOD) is a challenging and realistic task that extends beyond the scope of standard Object Detection task. It involves detecting both known and unknown objects while integrating learned knowledge for future tasks. However, the level of "unknownness" varies significantly depending on the context. For example, a tree is typically considered part of the background in a self-driving scene, but it may be significant in a household context. We argue that this contextual information should already be embedded within the known classes. In other words, there should be a semantic or latent structure relationship between the known and unknown items to be discovered. Motivated by this observation, we propose Hyp-OW, a method that learns and models hierarchical representation of known items through a SuperClass Regularizer. Leveraging this representation allows us to effectively detect unknown objects using a similarity distance-based relabeling module. Extensive experiments on benchmark datasets demonstrate the effectiveness of Hyp-OW, achieving improvement in both known and unknown detection (up to 6 percent). These findings are particularly pronounced in our newly designed benchmark, where a strong hierarchical structure exists between known and unknown objects. Our code can be found at https://github.com/boschresearch/Hyp-OW

Hyp-OW: Exploiting Hierarchical Structure Learning with Hyperbolic Distance Enhances Open World Object Detection

TL;DR

The paper tackles Open World Object Detection by addressing the lack of explicit unknown labeling and the weak linkage between known and unknown classes. It introduces Hyp-OW, a method that learns hierarchical representations in hyperbolic space through a Hyperbolic Contrastive Loss, a SuperClass Regularizer, and an Adaptive Relabeling scheme to detect unknowns. Empirical results show consistent improvements in both unknown recall and known detection across hierarchical and semi-hierarchical benchmarks, with ablations highlighting the contribution of each component and the role of hyperbolic curvature. The approach offers a principled way to exploit semantic hierarchies for open-world perception, with potential applicability to related tasks like OOD detection and instance segmentation, and points toward leveraging vision–language priors to further enhance unknown-object retrieval.

Abstract

Open World Object Detection (OWOD) is a challenging and realistic task that extends beyond the scope of standard Object Detection task. It involves detecting both known and unknown objects while integrating learned knowledge for future tasks. However, the level of "unknownness" varies significantly depending on the context. For example, a tree is typically considered part of the background in a self-driving scene, but it may be significant in a household context. We argue that this contextual information should already be embedded within the known classes. In other words, there should be a semantic or latent structure relationship between the known and unknown items to be discovered. Motivated by this observation, we propose Hyp-OW, a method that learns and models hierarchical representation of known items through a SuperClass Regularizer. Leveraging this representation allows us to effectively detect unknown objects using a similarity distance-based relabeling module. Extensive experiments on benchmark datasets demonstrate the effectiveness of Hyp-OW, achieving improvement in both known and unknown detection (up to 6 percent). These findings are particularly pronounced in our newly designed benchmark, where a strong hierarchical structure exists between known and unknown objects. Our code can be found at https://github.com/boschresearch/Hyp-OW
Paper Structure (51 sections, 10 equations, 17 figures, 15 tables)

This paper contains 51 sections, 10 equations, 17 figures, 15 tables.

Figures (17)

  • Figure 1: t-SNE plot of the learned class representations, with colors representing their respective categories. Our SuperClass Regularizer (right) learns the hierarchical structure by grouping together classes from the same category while pushing apart those from different categories.
  • Figure 2: Overview of Hyp-OW. Comprising three core components: the Hyperbolic Contrastive Loss for representation learning at the class level; the SuperClass Regularizer, for semantic relationships at the category level; and the Adaptive Relabeling module, for unknown retrieval with the previously learned representation. If a distance $d$ between a candidate proposal and known items is lower than a certain threshold ($\delta$), the proposal is relabelled as unknown.
  • Figure 3: Semantic Similarity between knowns and unknowns across tasks for each Split.
  • Figure 4: t-SNE plot of the learned class representations Hyperbolic Distance tends to learns a better hierarchical structure than Cosine Distance.
  • Figure 5: Hyperbolic Category - Class Distance Heatmap. The SuperClass Regularizer (right) effectively separates different categories (left), as indicated by the increased distance between each animal class (bottom) and the vehicle, outdoor, and furniture categories (darker colors). Without this regularizer (left), category inter-distance are much smaller (lighter color intensity).
  • ...and 12 more figures