CodeMerge: Codebook-Guided Model Merging for Robust Test-Time Adaptation in Autonomous Driving
Huitong Yang, Zhuoxiao Chen, Fengyi Zhang, Zi Huang, Yadan Luo
TL;DR
The paper tackles robust test-time adaptation for 3D perception in autonomous driving under distribution shifts. It introduces CodeMerge, a latent-space, codebook-guided model merging approach that uses compact fingerprints from a fixed source model and ridge leverage scores to inform merging, avoiding repeated full-model inferences. Key contributions include the Model CodeBook, curvature-aware merge scoring, and sign-consistent merging, along with comprehensive experiments showing significant gains in end-to-end perception, mapping, and planning, plus notable efficiency improvements over prior merging methods. The approach has practical impact by enabling fast, stable online adaptation in real-world driving scenarios with reduced memory and latency, and code is provided in supplementary materials.
Abstract
Maintaining robust 3D perception under dynamic and unpredictable test-time conditions remains a critical challenge for autonomous driving systems. Existing test-time adaptation (TTA) methods often fail in high-variance tasks like 3D object detection due to unstable optimization and sharp minima. While recent model merging strategies based on linear mode connectivity (LMC) offer improved stability by interpolating between fine-tuned checkpoints, they are computationally expensive, requiring repeated checkpoint access and multiple forward passes. In this paper, we introduce CodeMerge, a lightweight and scalable model merging framework that bypasses these limitations by operating in a compact latent space. Instead of loading full models, CodeMerge represents each checkpoint with a low-dimensional fingerprint derived from the source model's penultimate features and constructs a key-value codebook. We compute merging coefficients using ridge leverage scores on these fingerprints, enabling efficient model composition without compromising adaptation quality. Our method achieves strong performance across challenging benchmarks, improving end-to-end 3D detection 14.9% NDS on nuScenes-C and LiDAR-based detection by over 7.6% mAP on nuScenes-to-KITTI, while benefiting downstream tasks such as online mapping, motion prediction and planning even without training. Code and pretrained models are released in the supplementary material.
