Improving Batch Normalization with TTA for Robust Object Detection in Self-Driving
Dacheng Liao, Mengshi Qi, Liang Liu, Huadong Ma
TL;DR
This work tackles the challenge of domain shifts in autonomous driving perception by addressing the instability of test-time BN adaptation in large, deep networks. It introduces LearnableBN, a BN layer with auxiliary parameters and a Generalized-search Entropy Minimization (GSEM) loss to predict BN statistics without EMA, paired with a semantic-consistency based dual-stage adaptation to filter unstable samples. The approach yields up to ~8% improvements on Nuscenes-C across six corruptions and three severities, and demonstrates strong generalization to other baselines and datasets (e.g., Sparse4D and KITTI-C). The combination of per-layer BN modulation, stabilized entropy minimization, and semantic-signal filtering provides a robust, plug-in TTA framework for real-world, open-world autonomous driving scenarios.
Abstract
In current open real-world autonomous driving scenarios, challenges such as sensor failure and extreme weather conditions hinder the generalization of most autonomous driving perception models to these unseen domain due to the domain shifts between the test and training data. As the parameter scale of autonomous driving perception models grows, traditional test-time adaptation (TTA) methods become unstable and often degrade model performance in most scenarios. To address these challenges, this paper proposes two new robust methods to improve the Batch Normalization with TTA for object detection in autonomous driving: (1) We introduce a LearnableBN layer based on Generalized-search Entropy Minimization (GSEM) method. Specifically, we modify the traditional BN layer by incorporating auxiliary learnable parameters, which enables the BN layer to dynamically update the statistics according to the different input data. (2) We propose a new semantic-consistency based dual-stage-adaptation strategy, which encourages the model to iteratively search for the optimal solution and eliminates unstable samples during the adaptation process. Extensive experiments on the NuScenes-C dataset shows that our method achieves a maximum improvement of about 8% using BEVFormer as the baseline model across six corruption types and three levels of severity. We will make our source code available soon.
