Fully Test-Time Adaptation for Monocular 3D Object Detection

Hongbin Lin; Yifan Zhang; Shuaicheng Niu; Shuguang Cui; Zhen Li

Fully Test-Time Adaptation for Monocular 3D Object Detection

Hongbin Lin, Yifan Zhang, Shuaicheng Niu, Shuguang Cui, Zhen Li

TL;DR

This work tackles the challenge of out-of-distribution generalization for monocular 3D object detection by introducing MonoTTA, a Fully Test-Time Adaptation framework. MonoTTA employs two complementary strategies: Reliability-driven Adaptation, which selects high-score detections as reliable cues and optimizes the model via an adaptive loss \\mathcal{L}_{AO}, and Noise-Guard Adaptation, which leverages low-score detections through a negative learning regularizer \\mathcal{L}_{Nreg} to prevent overfitting and trivial solutions. The method updates only batch normalization parameters during test-time and uses an adaptive threshold \\alpha_t to balance reliable and noisy signals, enabling real-time adaptation. Empirical results on KITTI-C and nuScenes show substantial performance gains over strong baselines, including adversarial corruptions and real-world day/night scenarios, validating MonoTTA’s effectiveness for robust, fully test-time monocular 3D detection. Overall, the work highlights that high-score detections remain relatively reliable under corruption and that targeted adaptation of these signals, augmented by negative learning from low-score detections, yields significant improvements in OOD conditions with practical implications for autonomous driving systems.

Abstract

Monocular 3D object detection (Mono 3Det) aims to identify 3D objects from a single RGB image. However, existing methods often assume training and test data follow the same distribution, which may not hold in real-world test scenarios. To address the out-of-distribution (OOD) problems, we explore a new adaptation paradigm for Mono 3Det, termed Fully Test-time Adaptation. It aims to adapt a well-trained model to unlabeled test data by handling potential data distribution shifts at test time without access to training data and test labels. However, applying this paradigm in Mono 3Det poses significant challenges due to OOD test data causing a remarkable decline in object detection scores. This decline conflicts with the pre-defined score thresholds of existing detection methods, leading to severe object omissions (i.e., rare positive detections and many false negatives). Consequently, the limited positive detection and plenty of noisy predictions cause test-time adaptation to fail in Mono 3Det. To handle this problem, we propose a novel Monocular Test-Time Adaptation (MonoTTA) method, based on two new strategies. 1) Reliability-driven adaptation: we empirically find that high-score objects are still reliable and the optimization of high-score objects can enhance confidence across all detections. Thus, we devise a self-adaptive strategy to identify reliable objects for model adaptation, which discovers potential objects and alleviates omissions. 2) Noise-guard adaptation: since high-score objects may be scarce, we develop a negative regularization term to exploit the numerous low-score objects via negative learning, preventing overfitting to noise and trivial solutions. Experimental results show that MonoTTA brings significant performance gains for Mono 3Det models in OOD test scenarios, approximately 190% gains by average on KITTI and 198% gains on nuScenes.

Fully Test-Time Adaptation for Monocular 3D Object Detection

TL;DR

Abstract

Paper Structure (17 sections, 6 equations, 11 figures, 13 tables, 1 algorithm)

This paper contains 17 sections, 6 equations, 11 figures, 13 tables, 1 algorithm.

Introduction
Related Work
Monocular Test-Time Adaptation
Problem Statement
Overall Scheme
Reliability-Driven Adaptation
Noise-Guard Adaptation
Experiments
Comparisons with Previous Methods
More Severe Corruption and Real Scenario
Application to Instance-Level Inference Method
Ablation Studies and Quantitative Results
Conclusion
More Related Work and Discussions
More Details on Dataset Construction
...and 2 more sections

Figures (11)

Figure 1: An illustration of the generalizability issue of Mono 3Det models. Compared with in-distribution (In-dis) scenarios (e.g., sunny), the detection scores within out-of-distribution (OOD) test data suffer severe degradation when the well-trained model (MonoFlex zhang2021objects) is directly applied to test scenarios affected by common natural disruptions, like weather changes (e.g., snow and fog). Since existing Mono 3Det methods mainly adopt a pre-defined score threshold (e.g., 0.2) for object detection, it leads to severe omissions and unreliable detections, thereby suffering serious performance degradation. Note that test images are the same but under different weather conditions.
Figure 1: An illustration of 13 distinct types of corruptions in the severity level 5 of the KITTI-C dataset.
Figure 2: An illustration of our MonoTTA. During the test phase, only the pre-trained model $f_{\Theta_0}(\textbf{x})$ and unlabeled test images $\{\textbf{x}_i\}_{i=1}^{N_t}$ are given. To conduct model adaptation, we initialize the model $f_{\Theta}(\textbf{x})$ by $\Theta_0$ and only update the parameters of batch normalization layers. When a batch of test images arrives, we first compute test object scores and refine the adaptive threshold $\alpha$ to select the reliable high-score objects, thereby optimizing $\Theta$ via the adaptive optimization loss $\mathcal{L}_{AO}$. Meanwhile, we devise a negative regularization term $\mathcal{L}_{Nreg}$ to facilitate the model to avoid overfitting to noise and trivial solutions. Here, Ped. and Cyc. represent Pedestrian and Cyclist in KITTI.
Figure 2: An illustration of 4 common types of corruptions in real applications from 1 to 5 severity levels of the KITTI-C dataset.
Figure 3: Based on MonoGround qin2022monoground, we conduct two empirical studies (Car, KITTI), with the 3D IoU threshold of 0.5. (a) We visualize the accuracy of the objects across varied scoring ranges, which shows that the accuracy of objects with high scores remains relatively stable even in the presence of diverse corruptions (Ideal means in-distribution scenarios). (b) We visualize the number of low & high-score objects before and after optimization. Although only high-score objects are optimized, the model treats low-score objects with more confidence.
...and 6 more figures

Fully Test-Time Adaptation for Monocular 3D Object Detection

TL;DR

Abstract

Fully Test-Time Adaptation for Monocular 3D Object Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (11)