Open-Set Object Detection By Aligning Known Class Representations

Hiran Sarkar; Vishal Chudasama; Naoyuki Onoe; Pankaj Wasnik; Vineeth N Balasubramanian

Open-Set Object Detection By Aligning Known Class Representations

Hiran Sarkar, Vishal Chudasama, Naoyuki Onoe, Pankaj Wasnik, Vineeth N Balasubramanian

TL;DR

Open-Set Object Detection (OSOD) addresses detecting known classes while identifying unknown objects, a task where many detectors misclassify unknowns that are semantically near to known categories. The authors propose a semantic clustering-based framework that aligns region proposals with CLIP-based semantic embeddings, a class decorrelation module to enforce inter-cluster separation, and an object focus loss to strengthen objectness learning, complemented by entropy-thresholding evaluation and Harmonic Mean Precision (HMP) to jointly assess known and unknown performance, defined as $HMP = \frac{2 \cdot mAP_k \cdot AP_u}{mAP_k + AP_u}$. Built on Faster R-CNN, the approach adds a 1x1 conv to the RPN head, CLIP-based semantic alignment, and a decorrelation step, yielding strong improvements on VOC and COCO across multiple open-set settings and backbones. Ablation studies verify the contribution of each component, demonstrating superior AP$_u$, reduced AOSE, and improved WI and HMP over state-of-the-art methods like OpenDet and OpenSetRCNN. The results suggest that semantic boundary alignment and robust objectness learning significantly advance OSOD, with potential extensions to incremental OSOD and open-set domain adaptation.

Abstract

Open-Set Object Detection (OSOD) has emerged as a contemporary research direction to address the detection of unknown objects. Recently, few works have achieved remarkable performance in the OSOD task by employing contrastive clustering to separate unknown classes. In contrast, we propose a new semantic clustering-based approach to facilitate a meaningful alignment of clusters in semantic space and introduce a class decorrelation module to enhance inter-cluster separation. Our approach further incorporates an object focus module to predict objectness scores, which enhances the detection of unknown objects. Further, we employ i) an evaluation technique that penalizes low-confidence outputs to mitigate the risk of misclassification of the unknown objects and ii) a new metric called HMP that combines known and unknown precision using harmonic mean. Our extensive experiments demonstrate that the proposed model achieves significant improvement on the MS-COCO & PASCAL VOC dataset for the OSOD task.

Open-Set Object Detection By Aligning Known Class Representations

TL;DR

. Built on Faster R-CNN, the approach adds a 1x1 conv to the RPN head, CLIP-based semantic alignment, and a decorrelation step, yielding strong improvements on VOC and COCO across multiple open-set settings and backbones. Ablation studies verify the contribution of each component, demonstrating superior AP

, reduced AOSE, and improved WI and HMP over state-of-the-art methods like OpenDet and OpenSetRCNN. The results suggest that semantic boundary alignment and robust objectness learning significantly advance OSOD, with potential extensions to incremental OSOD and open-set domain adaptation.

Abstract

Paper Structure (25 sections, 10 equations, 9 figures, 8 tables)

This paper contains 25 sections, 10 equations, 9 figures, 8 tables.

Introduction
Related Works
Proposed Framework
Problem Statement & Notations
Semantic Clustering
Class Decorrelation
Object Focus Loss
Architectural Details
Experiments & Result Analysis
Implementation details
Result Analysis
Ablation Studies & Analysis
Conclusion & Future Work
Experimental Settings
Dataset Details
...and 10 more sections

Figures (9)

Figure 1: Effectiveness of semantic clustering, class decorrelation, and visual comparison with OpenDet OpenDet.
Figure 2: Overview of our proposed method. Object Focus loss: Object focus loss is a combination of $L_C$ (classification free loss) and $L_{obj}$ (classification based loss). Semantic Clustering:$\{T_1, T_2, T_3, \ldots, T_k\}$ represents the class embeddings of $k$ classes and $\{F_1, F_2, F_3, \ldots, F_m\}$ represents $m$ feature embeddings. Each $F_i$ gets aligned with its corresponding class embedding. Class Decorrelation: We sample one feature per unique class to create $\{F'_1, F'_2, F'_3, \ldots, F'_s\}$, where $s$ represents the total count of unique classes within an iteration. These sampled features are subsequently orthogonalized against the remaining features to ensure orthogonality.
Figure 3: Demonstrates the process of object focus loss along with a visual comparison with and without object focus module.
Figure 4: Visual comparison between our proposed model and baseline methods such as Faster R-CNN FasterRCNN and OpenDet OpenDet. More results can be visualized from the Supplementary materials. (Zoom-in for a better view)
Figure 5: Effect of weight coefficients $\alpha_1$, $\alpha_2$ and $\alpha_3$ in terms of HMP measure on VOC-COCO-40 setting.
...and 4 more figures

Open-Set Object Detection By Aligning Known Class Representations

TL;DR

Abstract

Open-Set Object Detection By Aligning Known Class Representations

Authors

TL;DR

Abstract

Table of Contents

Figures (9)