Table of Contents
Fetching ...

Towards Reliable Detection of Empty Space: Conditional Marked Point Processes for Object Detection

Tobias J. Riedlinger, Kira Maag, Hanno Gottschalk

Abstract

Deep neural networks have set the state-of-the-art in computer vision tasks such as bounding box detection and semantic segmentation. Object detectors and segmentation models assign confidence scores to predictions, reflecting the model's uncertainty in object detection or pixel-wise classification. However, these confidence estimates are often miscalibrated, as their architectures and loss functions are tailored to task performance rather than probabilistic foundation. Even with well calibrated predictions, object detectors fail to quantify uncertainty outside detected bounding boxes, i.e., the model does not make a probability assessment of whether an area without detected objects is truly free of obstacles. This poses a safety risk in applications such as automated driving, where uncertainty in empty areas remains unexplored. In this work, we propose an object detection model grounded in spatial statistics. Bounding box data matches realizations of a marked point process, commonly used to describe the probabilistic occurrence of spatial point events identified as bounding box centers, where marks are used to describe the spatial extension of bounding boxes and classes. Our statistical framework enables a likelihood-based training and provides well-defined confidence estimates for whether a region is drivable, i.e., free of objects. We demonstrate the effectiveness of our method through calibration assessments and evaluation of performance.

Towards Reliable Detection of Empty Space: Conditional Marked Point Processes for Object Detection

Abstract

Deep neural networks have set the state-of-the-art in computer vision tasks such as bounding box detection and semantic segmentation. Object detectors and segmentation models assign confidence scores to predictions, reflecting the model's uncertainty in object detection or pixel-wise classification. However, these confidence estimates are often miscalibrated, as their architectures and loss functions are tailored to task performance rather than probabilistic foundation. Even with well calibrated predictions, object detectors fail to quantify uncertainty outside detected bounding boxes, i.e., the model does not make a probability assessment of whether an area without detected objects is truly free of obstacles. This poses a safety risk in applications such as automated driving, where uncertainty in empty areas remains unexplored. In this work, we propose an object detection model grounded in spatial statistics. Bounding box data matches realizations of a marked point process, commonly used to describe the probabilistic occurrence of spatial point events identified as bounding box centers, where marks are used to describe the spatial extension of bounding boxes and classes. Our statistical framework enables a likelihood-based training and provides well-defined confidence estimates for whether a region is drivable, i.e., free of objects. We demonstrate the effectiveness of our method through calibration assessments and evaluation of performance.

Paper Structure

This paper contains 33 sections, 17 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Left: Semantic segmentation prediction. Center left: Poisson point process intensity. Center right: Conditional marked Poisson point process intensity. Right: Bounding box prediction.
  • Figure 2: Intensity landscape over an input image from the Cityscapes val dataset. Peaks are mostly sharply localized and indicate foreground detections.
  • Figure 3: Confidence calibration plots for semantic segmentation (blue) and PPP (orange) with corresponding ECE for the Cityscapes dataset and the DeepLabv3+ detector and $s=1,\!000$.
  • Figure 4: Visualization of ensemble intensity variance on two Cityscapes street scenes. We observe similar behavior as for individual models, where intensity for objects in the distance are sharply peaked, just as the standard deviations computed here. The top image shows how objects close to the ego car have medium variances spread out over a larger area.
  • Figure 5: Alternative two-stage object detection architecture based on a first-stage FCNN model for the intensity function. The second stage consists of a UNet encoder-decoder model with two heads, one predicting the spatial extension of bounding boxes and the other one classification. The second stage model is trained separately from first stage on original input images and intensity predictions.
  • ...and 2 more figures