Table of Contents
Fetching ...

MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection

Zhuoxiao Chen, Junjie Meng, Mahsa Baktashmotlagh, Yonggang Zhang, Zi Huang, Yadan Luo

TL;DR

The paper tackles the problem of test-time adaptation for LiDAR-based 3D object detection under real-world domain shifts, including challenging cross-corruption scenarios. It introduces MOS, a model-synergy framework that dynamically assembles a super model from a bank of historical checkpoints using synergy weights derived from a generalized Gram matrix built from feature- and output-level similarities. By selectively leveraging diverse long-term knowledge and updating the bank, MOS achieves robust adaptation across cross-dataset, corruption, and cross-corruption shifts, outperforming online TTA baselines and approaching Oracle performance in several tasks, albeit with higher computational demands. The work provides a practical, model-agnostic approach to real-time 3D detection deployment and offers a thorough analysis of components, ablations, and efficiency considerations, with future work aimed at improving time/space efficiency.

Abstract

LiDAR-based 3D object detection is crucial for various applications but often experiences performance degradation in real-world deployments due to domain shifts. While most studies focus on cross-dataset shifts, such as changes in environments and object geometries, practical corruptions from sensor variations and weather conditions remain underexplored. In this work, we propose a novel online test-time adaptation framework for 3D detectors that effectively tackles these shifts, including a challenging cross-corruption scenario where cross-dataset shifts and corruptions co-occur. By leveraging long-term knowledge from previous test batches, our approach mitigates catastrophic forgetting and adapts effectively to diverse shifts. Specifically, we propose a Model Synergy (MOS) strategy that dynamically selects historical checkpoints with diverse knowledge and assembles them to best accommodate the current test batch. This assembly is directed by our proposed Synergy Weights (SW), which perform a weighted averaging of the selected checkpoints, minimizing redundancy in the composite model. The SWs are computed by evaluating the similarity of predicted bounding boxes on the test data and the independence of features between checkpoint pairs in the model bank. To maintain an efficient and informative model bank, we discard checkpoints with the lowest average SW scores, replacing them with newly updated models. Our method was rigorously tested against existing test-time adaptation strategies across three datasets and eight types of corruptions, demonstrating superior adaptability to dynamic scenes and conditions. Notably, it achieved a 67.3% improvement in a challenging cross-corruption scenario, offering a more comprehensive benchmark for adaptation. Source code: https://github.com/zhuoxiao-chen/MOS.

MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection

TL;DR

The paper tackles the problem of test-time adaptation for LiDAR-based 3D object detection under real-world domain shifts, including challenging cross-corruption scenarios. It introduces MOS, a model-synergy framework that dynamically assembles a super model from a bank of historical checkpoints using synergy weights derived from a generalized Gram matrix built from feature- and output-level similarities. By selectively leveraging diverse long-term knowledge and updating the bank, MOS achieves robust adaptation across cross-dataset, corruption, and cross-corruption shifts, outperforming online TTA baselines and approaching Oracle performance in several tasks, albeit with higher computational demands. The work provides a practical, model-agnostic approach to real-time 3D detection deployment and offers a thorough analysis of components, ablations, and efficiency considerations, with future work aimed at improving time/space efficiency.

Abstract

LiDAR-based 3D object detection is crucial for various applications but often experiences performance degradation in real-world deployments due to domain shifts. While most studies focus on cross-dataset shifts, such as changes in environments and object geometries, practical corruptions from sensor variations and weather conditions remain underexplored. In this work, we propose a novel online test-time adaptation framework for 3D detectors that effectively tackles these shifts, including a challenging cross-corruption scenario where cross-dataset shifts and corruptions co-occur. By leveraging long-term knowledge from previous test batches, our approach mitigates catastrophic forgetting and adapts effectively to diverse shifts. Specifically, we propose a Model Synergy (MOS) strategy that dynamically selects historical checkpoints with diverse knowledge and assembles them to best accommodate the current test batch. This assembly is directed by our proposed Synergy Weights (SW), which perform a weighted averaging of the selected checkpoints, minimizing redundancy in the composite model. The SWs are computed by evaluating the similarity of predicted bounding boxes on the test data and the independence of features between checkpoint pairs in the model bank. To maintain an efficient and informative model bank, we discard checkpoints with the lowest average SW scores, replacing them with newly updated models. Our method was rigorously tested against existing test-time adaptation strategies across three datasets and eight types of corruptions, demonstrating superior adaptability to dynamic scenes and conditions. Notably, it achieved a 67.3% improvement in a challenging cross-corruption scenario, offering a more comprehensive benchmark for adaptation. Source code: https://github.com/zhuoxiao-chen/MOS.
Paper Structure (28 sections, 12 equations, 9 figures, 11 tables, 1 algorithm)

This paper contains 28 sections, 12 equations, 9 figures, 11 tables, 1 algorithm.

Figures (9)

  • Figure 1: We investigate three distinct types of domain shifts that 3D detectors face during test time. Unlike previous work that focuses solely on cross-dataset shifts, our study comprehensively explores shifts arising from varying weather conditions, sensor corruptions, and the most challenging cross-corruption shifts, which encompass both object- and environment-related variations.
  • Figure 2: The results (AP$_\text{3D}$) of applying existing TTA methods to adapt $\operatorname{SECOND}$DBLP:journals/sensors/YanML18 from nuScenes to KITTI. Mean-teacher models are in green.
  • Figure 3: Illustration of the model synergy (MOS) that selects key checkpoints and assembles them into a super model $f^*_{t+1}$ that tailors for each of test data $x_{t+1}$. "cyc." denotes the cyclist and "ped." denotes the pedestrian. MOS prioritizes checkpoints with unique insights that are absent in other checkpoints with higher weights while reducing the weights of those with redundant knowledge.
  • Figure 4: The overall workflow of our approach.
  • Figure 5: Heatmap intuitively presenting the results of TTA-3OD across hybrid cross-corruption shifts (Waymo $\rightarrow$ KITTI-C) in heavy difficulty. Darker/lighter shades indicate lower/higher performance.
  • ...and 4 more figures