Table of Contents
Fetching ...

Self-Aware Object Detection via Degradation Manifolds

Stefan Becker, Simon Weiss, Wolfgang Hübner, Michael Arens

TL;DR

This work introduces a degradation-aware self-awareness framework based on degradation manifolds, which explicitly structure a detector's feature space according to image degradation rather than semantic content, and suggests that degradation-aware representation geometry provides a practical and detector-agnostic foundation.

Abstract

Object detectors achieve strong performance under nominal imaging conditions but can fail silently when exposed to blur, noise, compression, adverse weather, or resolution changes. In safety-critical settings, it is therefore insufficient to produce predictions without assessing whether the input remains within the detector's nominal operating regime. We refer to this capability as self-aware object detection. We introduce a degradation-aware self-awareness framework based on degradation manifolds, which explicitly structure a detector's feature space according to image degradation rather than semantic content. Our method augments a standard detection backbone with a lightweight embedding head trained via multi-layer contrastive learning. Images sharing the same degradation composition are pulled together, while differing degradation configurations are pushed apart, yielding a geometrically organized representation that captures degradation type and severity without requiring degradation labels or explicit density modeling. To anchor the learned geometry, we estimate a pristine prototype from clean training embeddings, defining a nominal operating point in representation space. Self-awareness emerges as geometric deviation from this reference, providing an intrinsic, image-level signal of degradation-induced shift that is independent of detection confidence. Extensive experiments on synthetic corruption benchmarks, cross-dataset zero-shot transfer, and natural weather-induced distribution shifts demonstrate strong pristine-degraded separability, consistent behavior across multiple detector architectures, and robust generalization under semantic shift. These results suggest that degradation-aware representation geometry provides a practical and detector-agnostic foundation.

Self-Aware Object Detection via Degradation Manifolds

TL;DR

This work introduces a degradation-aware self-awareness framework based on degradation manifolds, which explicitly structure a detector's feature space according to image degradation rather than semantic content, and suggests that degradation-aware representation geometry provides a practical and detector-agnostic foundation.

Abstract

Object detectors achieve strong performance under nominal imaging conditions but can fail silently when exposed to blur, noise, compression, adverse weather, or resolution changes. In safety-critical settings, it is therefore insufficient to produce predictions without assessing whether the input remains within the detector's nominal operating regime. We refer to this capability as self-aware object detection. We introduce a degradation-aware self-awareness framework based on degradation manifolds, which explicitly structure a detector's feature space according to image degradation rather than semantic content. Our method augments a standard detection backbone with a lightweight embedding head trained via multi-layer contrastive learning. Images sharing the same degradation composition are pulled together, while differing degradation configurations are pushed apart, yielding a geometrically organized representation that captures degradation type and severity without requiring degradation labels or explicit density modeling. To anchor the learned geometry, we estimate a pristine prototype from clean training embeddings, defining a nominal operating point in representation space. Self-awareness emerges as geometric deviation from this reference, providing an intrinsic, image-level signal of degradation-induced shift that is independent of detection confidence. Extensive experiments on synthetic corruption benchmarks, cross-dataset zero-shot transfer, and natural weather-induced distribution shifts demonstrate strong pristine-degraded separability, consistent behavior across multiple detector architectures, and robust generalization under semantic shift. These results suggest that degradation-aware representation geometry provides a practical and detector-agnostic foundation.
Paper Structure (17 sections, 18 equations, 13 figures, 8 tables)

This paper contains 17 sections, 18 equations, 13 figures, 8 tables.

Figures (13)

  • Figure 1: Proposed degradation-manifold framework for self-awareness. Multi-layer backbone features are fused via $1{\times}1$ projections and attention pooling and mapped into a normalized embedding space trained with contrastive degradation compositions. A pristine prototype, computed from clean training images, anchors the manifold to nominal operating conditions. At inference, cosine distance to this prototype yields the image-level degradation score $S_{\mathrm{deg}}(\mathbf{x})$.
  • Figure 2: Detection performance (mAP@.5-.95 and mAP@.5) of different detection models under increasing corruption severity on the COCO val2017 dataset. Here the results of YOLOv9-m Wang_ECCV_2024, YOLOv10-m Wang_NeurIPS_2024, YOLOv11-m yolo11_ultralytics, and the transformer-based RT-DETR-l Zhao_2024_CVPR are shown.
  • Figure 3: Relative detection performance drop of YOLOv10-m Wang_NeurIPS_2024 on COCO val2017 under increasing native severity levels (mAP@0.5:0.95). Each cell shows the percentage drop relative to the clean val2017 baseline (pristine COCO images, no added degradations). Brighter cells indicate larger drops. Left: degradations from Michaelis_NeurIPSW_2019. Right: degradations from Agnolucci_WACV_2024.
  • Figure 4: t-SNE visualization Hinton_Roweis_2003 of the learned degradation manifold under two corruption taxonomies. (Top) Degradations from Michaelis_NeurIPSW_2019 based on Hendrycks_ICLR_2019. (Bottom) Degradations from Agnolucci_WACV_2024. Each color denotes a specific degradation type, while marker styles indicate the corresponding degradation group. For visualization, 100 COCO Lin_ECCV_2014 samples are corrupted at severity level 5. Distinct clusters emerge for different degradation types. For training, only the degradations from Agnolucci_WACV_2024 were used.
  • Figure 5: t-SNE visualization Hinton_Roweis_2003 of the learned degradation manifold. Samples are drawn from multiple datasets (COCO Lin_ECCV_2014, BDD Yu_CVPR_2020, KITTI Geiger_CVPR_2012, DETRAC Wen_CVIU_2020, UAVDT Du_ECCV_2018, and FLIR (VIS) flirDataset). Degradations are clearly separated in embedding space, while pristine images form a compact cluster across datasets, indicating content-independence of the learned representation. Notably, BDD and KITTI images were not used during training, demonstrating cross-dataset generalization of the degradation manifold and that the embedding organizes images according to degradation characteristics rather than semantic content.
  • ...and 8 more figures