Table of Contents
Fetching ...

CodeMerge: Codebook-Guided Model Merging for Robust Test-Time Adaptation in Autonomous Driving

Huitong Yang, Zhuoxiao Chen, Fengyi Zhang, Zi Huang, Yadan Luo

TL;DR

The paper tackles robust test-time adaptation for 3D perception in autonomous driving under distribution shifts. It introduces CodeMerge, a latent-space, codebook-guided model merging approach that uses compact fingerprints from a fixed source model and ridge leverage scores to inform merging, avoiding repeated full-model inferences. Key contributions include the Model CodeBook, curvature-aware merge scoring, and sign-consistent merging, along with comprehensive experiments showing significant gains in end-to-end perception, mapping, and planning, plus notable efficiency improvements over prior merging methods. The approach has practical impact by enabling fast, stable online adaptation in real-world driving scenarios with reduced memory and latency, and code is provided in supplementary materials.

Abstract

Maintaining robust 3D perception under dynamic and unpredictable test-time conditions remains a critical challenge for autonomous driving systems. Existing test-time adaptation (TTA) methods often fail in high-variance tasks like 3D object detection due to unstable optimization and sharp minima. While recent model merging strategies based on linear mode connectivity (LMC) offer improved stability by interpolating between fine-tuned checkpoints, they are computationally expensive, requiring repeated checkpoint access and multiple forward passes. In this paper, we introduce CodeMerge, a lightweight and scalable model merging framework that bypasses these limitations by operating in a compact latent space. Instead of loading full models, CodeMerge represents each checkpoint with a low-dimensional fingerprint derived from the source model's penultimate features and constructs a key-value codebook. We compute merging coefficients using ridge leverage scores on these fingerprints, enabling efficient model composition without compromising adaptation quality. Our method achieves strong performance across challenging benchmarks, improving end-to-end 3D detection 14.9% NDS on nuScenes-C and LiDAR-based detection by over 7.6% mAP on nuScenes-to-KITTI, while benefiting downstream tasks such as online mapping, motion prediction and planning even without training. Code and pretrained models are released in the supplementary material.

CodeMerge: Codebook-Guided Model Merging for Robust Test-Time Adaptation in Autonomous Driving

TL;DR

The paper tackles robust test-time adaptation for 3D perception in autonomous driving under distribution shifts. It introduces CodeMerge, a latent-space, codebook-guided model merging approach that uses compact fingerprints from a fixed source model and ridge leverage scores to inform merging, avoiding repeated full-model inferences. Key contributions include the Model CodeBook, curvature-aware merge scoring, and sign-consistent merging, along with comprehensive experiments showing significant gains in end-to-end perception, mapping, and planning, plus notable efficiency improvements over prior merging methods. The approach has practical impact by enabling fast, stable online adaptation in real-world driving scenarios with reduced memory and latency, and code is provided in supplementary materials.

Abstract

Maintaining robust 3D perception under dynamic and unpredictable test-time conditions remains a critical challenge for autonomous driving systems. Existing test-time adaptation (TTA) methods often fail in high-variance tasks like 3D object detection due to unstable optimization and sharp minima. While recent model merging strategies based on linear mode connectivity (LMC) offer improved stability by interpolating between fine-tuned checkpoints, they are computationally expensive, requiring repeated checkpoint access and multiple forward passes. In this paper, we introduce CodeMerge, a lightweight and scalable model merging framework that bypasses these limitations by operating in a compact latent space. Instead of loading full models, CodeMerge represents each checkpoint with a low-dimensional fingerprint derived from the source model's penultimate features and constructs a key-value codebook. We compute merging coefficients using ridge leverage scores on these fingerprints, enabling efficient model composition without compromising adaptation quality. Our method achieves strong performance across challenging benchmarks, improving end-to-end 3D detection 14.9% NDS on nuScenes-C and LiDAR-based detection by over 7.6% mAP on nuScenes-to-KITTI, while benefiting downstream tasks such as online mapping, motion prediction and planning even without training. Code and pretrained models are released in the supplementary material.

Paper Structure

This paper contains 16 sections, 12 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Overview of real-world test-time shifts (top) and 3D perception systems considered in this work (bottom). We study test-time adaptation (TTA) in two settings: (1) an end-to-end autonomous driving system and (2) a modular LiDAR-based detector, both affected by adverse weather and sensor failures. CodeMerge enables efficient TTA by leveraging compact fingerprints to guide model merging.
  • Figure 2: Conceptual comparison of model merging strategies for TTA. Unlike EMA (left), which ignores model behavior, or MOS (middle), which requires multiple inferences to compute merging weights, CodeMerge (right) leverages ridge leverage scores in a compact fingerprint space to efficiently guide model merging.
  • Figure 3: Pairwise fingerprint differences correlate strongly with model weight differences (Pearson $r$ and Kendall Tau $\tau$ > 0.7) across SparseDrive SparseDrive and SECOND DBLP:journals/sensors/YanML18, showing that the low-dimensional fingerprint space reliably reflects parameter space structure.
  • Figure 4: Visualization of outputs of SparseDrive (bottom) and after CodeMerge adaptation (upper) under severe motion blur. TTA greatly improves detection by capturing more true positive instances, which consequently enhances downstream mapping and planning accuracy (right).
  • Figure 5: Visualization of outputs of SparseDrive (bottom) and after CodeMerge adaptation (upper) under severe ColorQuant, LowLight, and Snow. TTA greatly improves detection by capturing more true positive instances.

Theorems & Definitions (1)

  • Definition 1: Ridge Leverage Score (RLS)