Run-time Monitoring of 3D Object Detection in Automated Driving Systems Using Early Layer Neural Activation Patterns

Hakan Yekta Yatbaz; Mehrdad Dianati; Konstantinos Koufos; Roger Woodman

Run-time Monitoring of 3D Object Detection in Automated Driving Systems Using Early Layer Neural Activation Patterns

Hakan Yekta Yatbaz, Mehrdad Dianati, Konstantinos Koufos, Roger Woodman

TL;DR

This work addresses the safety-critical problem of runtime integrity monitoring for LiDAR-based 3D object detectors in automated driving systems. It systematically investigates activation patterns across backbone layers and introduces a multi-layer fusion introspection pipeline that combines processed point clouds, mid-layer activations, and output activations to detect detection errors as a binary $ ext{Error}$ vs $ ext{No-Error}$ decision. Through experiments on KITTI and NuScenes with PointPillars and CenterPoint (SECOND backbones), the study demonstrates that early-layer activations can improve error detection, while concatenating activations from multiple layers offers a balanced trade-off between performance and computation. The proposed approach achieves strong AUROC and favorable real-time performance, underscoring its potential to enhance safety and trust in ADS by enabling timely alerts or fallback maneuvers, with avenues for future work in domain shift handling and more advanced multi-layer fusion strategies.

Abstract

Monitoring the integrity of object detection for errors within the perception module of automated driving systems (ADS) is paramount for ensuring safety. Despite recent advancements in deep neural network (DNN)-based object detectors, their susceptibility to detection errors, particularly in the less-explored realm of 3D object detection, remains a significant concern. State-of-the-art integrity monitoring (also known as introspection) mechanisms in 2D object detection mainly utilise the activation patterns in the final layer of the DNN-based detector's backbone. However, that may not sufficiently address the complexities and sparsity of data in 3D object detection. To this end, we conduct, in this article, an extensive investigation into the effects of activation patterns extracted from various layers of the backbone network for introspecting the operation of 3D object detectors. Through a comparative analysis using Kitti and NuScenes datasets with PointPillars and CenterPoint detectors, we demonstrate that using earlier layers' activation patterns enhances the error detection performance of the integrity monitoring system, yet increases computational complexity. To address the real-time operation requirements in ADS, we also introduce a novel introspection method that combines activation patterns from multiple layers of the detector's backbone and report its performance.

Run-time Monitoring of 3D Object Detection in Automated Driving Systems Using Early Layer Neural Activation Patterns

TL;DR

decision. Through experiments on KITTI and NuScenes with PointPillars and CenterPoint (SECOND backbones), the study demonstrates that early-layer activations can improve error detection, while concatenating activations from multiple layers offers a balanced trade-off between performance and computation. The proposed approach achieves strong AUROC and favorable real-time performance, underscoring its potential to enhance safety and trust in ADS by enabling timely alerts or fallback maneuvers, with avenues for future work in domain shift handling and more advanced multi-layer fusion strategies.

Abstract

Paper Structure (15 sections, 4 figures, 2 tables)

This paper contains 15 sections, 4 figures, 2 tables.

Introduction
Related Work
Method
Performance Evaluations
Object Detectors
Datasets
Introspection Mechanisms
Introspection Training and Implementation
Performance Metrics
Performance Comparison
Detection Performance
Model Confidence
Computational Complexity
Qualitative Comparison
Summary & Conclusions

Figures (4)

Figure 1: LiDAR-based object detection pipeline depicted at the top starting with a processor network that extracts features from the point cloud. The extracted features are then processed by a backbone network to compute neural activation patterns at the mid and last layers. These patterns are managed by the Neural Activation Pattern Operator as part of the introspection framework, which either combines them (our proposed method) or chooses patterns from an earlier layer (as part of our investigation). Finally, the selected pattern feeds into the Introspection Network, which classifies the collected point cloud as 'Error' or 'No-Error'.
Figure 2: Proposed introspection mechanism for LiDAR-based 3D object detection. The introspection mechanism depicted at the bottom captures the processed point cloud data, mid-layer neural activations, and backbone network outputs from the main object detection pipeline. An adaptive average pooling layer spatially adjusts these inputs to ensure uniform feature representation before concatenation, albeit with some resolution loss. The concatenated features are fed into the introspector network that comprises a ResNet18 for feature extraction and a fully-connected network for error prediction, ultimately assessing object detection errors as binary classification.
Figure 3: Comparative analysis of confidence distributions across different inputs for introspection. The violin plots merged with a boxplot depict the confidence distributions for a trained neural network when tested on two datasets: Kitti and NuScenes. Each row represents an input modality: (a) PPC, (b) MLA, (c) LLA, and (d) the proposed method. Within each modality, distributions are provided for true positives, false positives, false negatives, and true negatives. The width of each plot indicates the probability density of the data at different confidence levels, with mean and interquartile ranges also shown.
Figure 4: Max activation maps and Eigen-CAM visualisations for example frames on Kitti and NuScenes datasets. Every row represents a different activation map modality: PPC, MLA, LLA, and proposed. The first and third columns display the channel-wise max activations for the Kitti and NuScenes datasets, while the second and fourth columns exhibit the respective Eigen-CAM heatmaps that highlight areas critical to the classification. For clarity, objects correctly detected are marked with green boxes, while missed ones are highlighted with orange boxes.

Run-time Monitoring of 3D Object Detection in Automated Driving Systems Using Early Layer Neural Activation Patterns

TL;DR

Abstract

Run-time Monitoring of 3D Object Detection in Automated Driving Systems Using Early Layer Neural Activation Patterns

Authors

TL;DR

Abstract

Table of Contents

Figures (4)