PEFT-DML: Parameter-Efficient Fine-Tuning Deep Metric Learning for Robust Multi-Modal 3D Object Detection in Autonomous Driving
Abdolazim Rezaei, Mehdi Sookhak
TL;DR
PEFT-DML addresses robust multi-modal 3D object detection for autonomous driving under sensor dropout and modality variability. It unifies LiDAR, radar, camera, IMU, and GNSS into a shared latent space and uses parameter-efficient LoRA adapters with lightweight fusion to enable fine-tuning while keeping backbones frozen. A joint loss comprising detection, metric alignment, and consistency terms drives cross-modal generalization and temporal stability. On nuScenes, it achieves state-of-the-art performance (mAP 62.2, NDS 71.7) while updating under 10% of parameters, demonstrating strong robustness to weather and sensor dropout and practical efficiency improvements.
Abstract
This study introduces PEFT-DML, a parameter-efficient deep metric learning framework for robust multi-modal 3D object detection in autonomous driving. Unlike conventional models that assume fixed sensor availability, PEFT-DML maps diverse modalities (LiDAR, radar, camera, IMU, GNSS) into a shared latent space, enabling reliable detection even under sensor dropout or unseen modality class combinations. By integrating Low-Rank Adaptation (LoRA) and adapter layers, PEFT-DML achieves significant training efficiency while enhancing robustness to fast motion, weather variability, and domain shifts. Experiments on benchmarks nuScenes demonstrate superior accuracy.
