Table of Contents
Fetching ...

Improving Batch Normalization with TTA for Robust Object Detection in Self-Driving

Dacheng Liao, Mengshi Qi, Liang Liu, Huadong Ma

TL;DR

This work tackles the challenge of domain shifts in autonomous driving perception by addressing the instability of test-time BN adaptation in large, deep networks. It introduces LearnableBN, a BN layer with auxiliary parameters and a Generalized-search Entropy Minimization (GSEM) loss to predict BN statistics without EMA, paired with a semantic-consistency based dual-stage adaptation to filter unstable samples. The approach yields up to ~8% improvements on Nuscenes-C across six corruptions and three severities, and demonstrates strong generalization to other baselines and datasets (e.g., Sparse4D and KITTI-C). The combination of per-layer BN modulation, stabilized entropy minimization, and semantic-signal filtering provides a robust, plug-in TTA framework for real-world, open-world autonomous driving scenarios.

Abstract

In current open real-world autonomous driving scenarios, challenges such as sensor failure and extreme weather conditions hinder the generalization of most autonomous driving perception models to these unseen domain due to the domain shifts between the test and training data. As the parameter scale of autonomous driving perception models grows, traditional test-time adaptation (TTA) methods become unstable and often degrade model performance in most scenarios. To address these challenges, this paper proposes two new robust methods to improve the Batch Normalization with TTA for object detection in autonomous driving: (1) We introduce a LearnableBN layer based on Generalized-search Entropy Minimization (GSEM) method. Specifically, we modify the traditional BN layer by incorporating auxiliary learnable parameters, which enables the BN layer to dynamically update the statistics according to the different input data. (2) We propose a new semantic-consistency based dual-stage-adaptation strategy, which encourages the model to iteratively search for the optimal solution and eliminates unstable samples during the adaptation process. Extensive experiments on the NuScenes-C dataset shows that our method achieves a maximum improvement of about 8% using BEVFormer as the baseline model across six corruption types and three levels of severity. We will make our source code available soon.

Improving Batch Normalization with TTA for Robust Object Detection in Self-Driving

TL;DR

This work tackles the challenge of domain shifts in autonomous driving perception by addressing the instability of test-time BN adaptation in large, deep networks. It introduces LearnableBN, a BN layer with auxiliary parameters and a Generalized-search Entropy Minimization (GSEM) loss to predict BN statistics without EMA, paired with a semantic-consistency based dual-stage adaptation to filter unstable samples. The approach yields up to ~8% improvements on Nuscenes-C across six corruptions and three severities, and demonstrates strong generalization to other baselines and datasets (e.g., Sparse4D and KITTI-C). The combination of per-layer BN modulation, stabilized entropy minimization, and semantic-signal filtering provides a robust, plug-in TTA framework for real-world, open-world autonomous driving scenarios.

Abstract

In current open real-world autonomous driving scenarios, challenges such as sensor failure and extreme weather conditions hinder the generalization of most autonomous driving perception models to these unseen domain due to the domain shifts between the test and training data. As the parameter scale of autonomous driving perception models grows, traditional test-time adaptation (TTA) methods become unstable and often degrade model performance in most scenarios. To address these challenges, this paper proposes two new robust methods to improve the Batch Normalization with TTA for object detection in autonomous driving: (1) We introduce a LearnableBN layer based on Generalized-search Entropy Minimization (GSEM) method. Specifically, we modify the traditional BN layer by incorporating auxiliary learnable parameters, which enables the BN layer to dynamically update the statistics according to the different input data. (2) We propose a new semantic-consistency based dual-stage-adaptation strategy, which encourages the model to iteratively search for the optimal solution and eliminates unstable samples during the adaptation process. Extensive experiments on the NuScenes-C dataset shows that our method achieves a maximum improvement of about 8% using BEVFormer as the baseline model across six corruption types and three levels of severity. We will make our source code available soon.

Paper Structure

This paper contains 27 sections, 9 equations, 14 figures, 8 tables, 1 algorithm.

Figures (14)

  • Figure 1: Illustration of the problems faced by BEV-based 3D object detection model struggles to perceive unseen domains caused by extreme weather conditions. In order to enhance the robustness of the model, TTA method estimating the BN statistics of the unseen domains during the testing phase.
  • Figure 2: Method Overview. Module (a) demonstrates the Semantic-Consistency based Dual-Stage-Adaptation, which consists of a stable adaptation phase with a low learning rate and an aggressive adaptation phase with a high learning rate. In the aggressive adaptation phase, the model trained in the stable adaptation phase is used to predict the same samples, and calculate KL divergence between their prediction to filter noisy samples. Module (b) is intended to describe the training process. First, auxiliary learnable parameters are introduced into the BN layer, We frozen all model parameters, and only the auxiliary parameters are learnable. Then adaptation is conducted using the GSEM loss function. It is important to note that the BN statistics are not changed during forward propagation, but are rectified after optimization.
  • Figure 3: Comparison of Detection Results for Different Categories in Snow Scenarios and the severity is Hard. The baseline model is BEVFormer.
  • Figure 4: Example of BEV visualization results, where green bounding box is the ground truth, blue bounding box is the prediction results, and the red boxes highlight the difference before and after using our proposed LearnableBN.
  • Figure 5: Examples of visualization results w.r.t six perspectives, where the red boxes highlight the difference before and after using our proposed LearnableBN. Results on cars are colored in yellow, pedestrian in blue and cyclists in gleen.
  • ...and 9 more figures