Table of Contents
Fetching ...

RCDN: Towards Robust Camera-Insensitivity Collaborative Perception via Dynamic Feature-based 3D Neural Modeling

Tianhang Wang, Fan Lu, Zehan Zheng, Zhijun Li, Guang Chen, Changjun Jiang

TL;DR

RCDN tackles camera-insensitivity in multi-agent collaborative perception by introducing a dynamic feature-based 3D neural modeling framework that builds a time-invariant static background field and a time-varying dynamic foreground field. The method uses a geometry BEV volume feature and hash-grid based neural fields to render repaired views without extra inference bandwidth, enabling robust perception under unpredictable camera failures. Training combines static and dynamic losses with motion-consistency constraints, expressed as $\mathcal{L}_{total} = \lambda_1\mathcal{L}_{static} + \lambda_2\mathcal{L}_{dyn} + \lambda_3\mathcal{L}_{opt} + \lambda_4\mathcal{L}_{cyc}$, and demonstrates significant improvements on the OPV2V-N dataset, including portability to other baselines and strong robustness under extreme camera-insensitivity settings. The work also contributes OPV2V-N, a large-scale dataset reflecting realistic camera failure scenarios, enabling broader evaluation and downstream task transfer, and offers practical implications for reliable multi-agent collaboration with limited calibration and communication overhead.

Abstract

Collaborative perception is dedicated to tackling the constraints of single-agent perception, such as occlusions, based on the multiple agents' multi-view sensor inputs. However, most existing works assume an ideal condition that all agents' multi-view cameras are continuously available. In reality, cameras may be highly noisy, obscured or even failed during the collaboration. In this work, we introduce a new robust camera-insensitivity problem: how to overcome the issues caused by the failed camera perspectives, while stabilizing high collaborative performance with low calibration cost? To address above problems, we propose RCDN, a Robust Camera-insensitivity collaborative perception with a novel Dynamic feature-based 3D Neural modeling mechanism. The key intuition of RCDN is to construct collaborative neural rendering field representations to recover failed perceptual messages sent by multiple agents. To better model collaborative neural rendering field, RCDN first establishes a geometry BEV feature based time-invariant static field with other agents via fast hash grid modeling. Based on the static background field, the proposed time-varying dynamic field can model corresponding motion vectors for foregrounds with appropriate positions. To validate RCDN, we create OPV2V-N, a new large-scale dataset with manual labelling under different camera failed scenarios. Extensive experiments conducted on OPV2V-N show that RCDN can be ported to other baselines and improve their robustness in extreme camera-insensitivity settings.

RCDN: Towards Robust Camera-Insensitivity Collaborative Perception via Dynamic Feature-based 3D Neural Modeling

TL;DR

RCDN tackles camera-insensitivity in multi-agent collaborative perception by introducing a dynamic feature-based 3D neural modeling framework that builds a time-invariant static background field and a time-varying dynamic foreground field. The method uses a geometry BEV volume feature and hash-grid based neural fields to render repaired views without extra inference bandwidth, enabling robust perception under unpredictable camera failures. Training combines static and dynamic losses with motion-consistency constraints, expressed as , and demonstrates significant improvements on the OPV2V-N dataset, including portability to other baselines and strong robustness under extreme camera-insensitivity settings. The work also contributes OPV2V-N, a large-scale dataset reflecting realistic camera failure scenarios, enabling broader evaluation and downstream task transfer, and offers practical implications for reliable multi-agent collaboration with limited calibration and communication overhead.

Abstract

Collaborative perception is dedicated to tackling the constraints of single-agent perception, such as occlusions, based on the multiple agents' multi-view sensor inputs. However, most existing works assume an ideal condition that all agents' multi-view cameras are continuously available. In reality, cameras may be highly noisy, obscured or even failed during the collaboration. In this work, we introduce a new robust camera-insensitivity problem: how to overcome the issues caused by the failed camera perspectives, while stabilizing high collaborative performance with low calibration cost? To address above problems, we propose RCDN, a Robust Camera-insensitivity collaborative perception with a novel Dynamic feature-based 3D Neural modeling mechanism. The key intuition of RCDN is to construct collaborative neural rendering field representations to recover failed perceptual messages sent by multiple agents. To better model collaborative neural rendering field, RCDN first establishes a geometry BEV feature based time-invariant static field with other agents via fast hash grid modeling. Based on the static background field, the proposed time-varying dynamic field can model corresponding motion vectors for foregrounds with appropriate positions. To validate RCDN, we create OPV2V-N, a new large-scale dataset with manual labelling under different camera failed scenarios. Extensive experiments conducted on OPV2V-N show that RCDN can be ported to other baselines and improve their robustness in extreme camera-insensitivity settings.
Paper Structure (30 sections, 13 equations, 21 figures, 9 tables)

This paper contains 30 sections, 13 equations, 21 figures, 9 tables.

Figures (21)

  • Figure 1: Illustration of noisy camera situations (blurred, occluded and even failed) during collaboration and the perception result w.o./w. RCDN. orange for drivable areas segmentation, blue for lanes and teal for dynamic vehicles.
  • Figure 2: System overview. The geometry BEV generation module provides feature sampling for later processes. The collaborative static and dynamic fields are performed in parallel to model the background and foreground, respectively. Note that MCP is short for the multi-agents collaborative perception process.
  • Figure 2: Ablation Study on OPV2V-N dataset.
  • Figure 3: Comparison of the performance of other baseline methods w.o/w the proposed RCDN under the random noisy (failed situation) camera numbers from 0 to 3. RCDN can be ported to other baseline methods and stabilize the performance under different level camera failure situations on OPV2V-N dataset.
  • Figure 4: Effectiveness of dynamic neural field.
  • ...and 16 more figures