Table of Contents
Fetching ...

PEFT-DML: Parameter-Efficient Fine-Tuning Deep Metric Learning for Robust Multi-Modal 3D Object Detection in Autonomous Driving

Abdolazim Rezaei, Mehdi Sookhak

TL;DR

PEFT-DML addresses robust multi-modal 3D object detection for autonomous driving under sensor dropout and modality variability. It unifies LiDAR, radar, camera, IMU, and GNSS into a shared latent space and uses parameter-efficient LoRA adapters with lightweight fusion to enable fine-tuning while keeping backbones frozen. A joint loss comprising detection, metric alignment, and consistency terms drives cross-modal generalization and temporal stability. On nuScenes, it achieves state-of-the-art performance (mAP 62.2, NDS 71.7) while updating under 10% of parameters, demonstrating strong robustness to weather and sensor dropout and practical efficiency improvements.

Abstract

This study introduces PEFT-DML, a parameter-efficient deep metric learning framework for robust multi-modal 3D object detection in autonomous driving. Unlike conventional models that assume fixed sensor availability, PEFT-DML maps diverse modalities (LiDAR, radar, camera, IMU, GNSS) into a shared latent space, enabling reliable detection even under sensor dropout or unseen modality class combinations. By integrating Low-Rank Adaptation (LoRA) and adapter layers, PEFT-DML achieves significant training efficiency while enhancing robustness to fast motion, weather variability, and domain shifts. Experiments on benchmarks nuScenes demonstrate superior accuracy.

PEFT-DML: Parameter-Efficient Fine-Tuning Deep Metric Learning for Robust Multi-Modal 3D Object Detection in Autonomous Driving

TL;DR

PEFT-DML addresses robust multi-modal 3D object detection for autonomous driving under sensor dropout and modality variability. It unifies LiDAR, radar, camera, IMU, and GNSS into a shared latent space and uses parameter-efficient LoRA adapters with lightweight fusion to enable fine-tuning while keeping backbones frozen. A joint loss comprising detection, metric alignment, and consistency terms drives cross-modal generalization and temporal stability. On nuScenes, it achieves state-of-the-art performance (mAP 62.2, NDS 71.7) while updating under 10% of parameters, demonstrating strong robustness to weather and sensor dropout and practical efficiency improvements.

Abstract

This study introduces PEFT-DML, a parameter-efficient deep metric learning framework for robust multi-modal 3D object detection in autonomous driving. Unlike conventional models that assume fixed sensor availability, PEFT-DML maps diverse modalities (LiDAR, radar, camera, IMU, GNSS) into a shared latent space, enabling reliable detection even under sensor dropout or unseen modality class combinations. By integrating Low-Rank Adaptation (LoRA) and adapter layers, PEFT-DML achieves significant training efficiency while enhancing robustness to fast motion, weather variability, and domain shifts. Experiments on benchmarks nuScenes demonstrate superior accuracy.

Paper Structure

This paper contains 4 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: The PEFT-DML pipeline unifies various modalities into a shared latent space with LoRA and adapter layers.
  • Figure 2: Comparison over different climate conditions.
  • Figure 3: PEFT-DML achieves nearly the same or slightly higher accuracy than Full Fine-Tuning (Full-FT) while updating less than 10% of the parameters.