LiDAR-Anchored Collaborative Distillation for Robust 2D Representations
Wonjun Jo, Hyunwoo Ha, Kim Ji-Yeon, Hawook Jeong, Tae-Hyun Oh
TL;DR
This work tackles the fragility of self-supervised 2D image encoders in adverse weather by introducing a LiDAR-anchored Collaborative Distillation framework. It employs a two-stage, cross-modal approach: Stage 1 pre-aligns LiDAR features to the clear-day 2D feature space, and Stage 2 uses these aligned 3D features as 3D-anchored supervision to denoise and regularize degraded 2D representations. The method improves in-domain and out-of-domain semantic segmentation and depth estimation, while also enhancing 3D awareness, with strong generalization across outdoor and indoor datasets. Practically, this yields more robust perception pipelines for vision-based systems operating under real-world, degraded conditions."
Abstract
As deep learning continues to advance, self-supervised learning has made considerable strides. It allows 2D image encoders to extract useful features for various downstream tasks, including those related to vision-based systems. Nevertheless, pre-trained 2D image encoders fall short in conducting the task under noisy and adverse weather conditions beyond clear daytime scenes, which require for robust visual perception. To address these issues, we propose a novel self-supervised approach, \textbf{Collaborative Distillation}, which leverages 3D LiDAR as self-supervision to improve robustness to noisy and adverse weather conditions in 2D image encoders while retaining their original capabilities. Our method outperforms competing methods in various downstream tasks across diverse conditions and exhibits strong generalization ability. In addition, our method also improves 3D awareness stemming from LiDAR's characteristics. This advancement highlights our method's practicality and adaptability in real-world scenarios.
