Table of Contents
Fetching ...

Move What Matters: Parameter-Efficient Domain Adaptation via Optimal Transport Flow for Collaborative Perception

Zesheng Jia, Jin Wang, Siao Liu, Lingzhi Li, Ziyao Huang, Yunjiang Xu, Jianping Wang

TL;DR

This work addresses sim-to-real domain gaps in V2X collaborative perception by reframing adaptation as an optimal-transport problem. It introduces FlowAdapt, which combines Wasserstein Greedy Sampling to filter redundant spatio-temporal data and Progressive Knowledge Transfer to pass compressed early-layer semantics to later stages, all while updating only a small parameter subset $\Delta\Theta$. The method achieves state-of-the-art results with about 1% trainable parameters across OPV2V, DAIR-V2X, and V2XSet, and exhibits robustness to localization noise, backed by theoretical guarantees such as $R(\mathcal{S}_{\mathrm{WGS}}) \le 2R^*$ for coverage. These contributions offer a data- and computation-efficient pathway for reliable multi-agent perception in diverse environments, enabling practical deployment in real-world V2X systems.

Abstract

Fast domain adaptation remains a fundamental challenge for deploying multi-agent systems across diverse environments in Vehicle-to-Everything (V2X) collaborative perception. Despite the success of Parameter-Efficient Fine-Tuning (PEFT) in natural language processing and conventional vision tasks, directly applying PEFT to multi-agent settings leads to significant performance degradation and training instability. In this work, we conduct a detailed analysis and identify two key factors: (i) inter-frame redundancy in heterogeneous sensory streams, and (ii) erosion of fine-grained semantics in deep-layer representations under PEFT adaptation. To address these issues, we propose FlowAdapt, a parameter-efficient framework grounded in optimal transport theory, which minimizes information transport costs across both data distributions and network hierarchies. Specifically, we introduce a Wasserstein Greedy Sampling strategy to selectively filter redundant samples via a bounded covering radius. Furthermore, Progressive Knowledge Transfer module is designed to progressively inject compressed early-stage representations into later stages through learnable pathways, alleviating semantic degradation in late-stage adaptation. Extensive experiments on three benchmarks demonstrate that FlowAdapt achieves state-of-the-art performance with only 1% of trainable parameters, effectively bridging domain gaps with superior sample efficiency and generalization.

Move What Matters: Parameter-Efficient Domain Adaptation via Optimal Transport Flow for Collaborative Perception

TL;DR

This work addresses sim-to-real domain gaps in V2X collaborative perception by reframing adaptation as an optimal-transport problem. It introduces FlowAdapt, which combines Wasserstein Greedy Sampling to filter redundant spatio-temporal data and Progressive Knowledge Transfer to pass compressed early-layer semantics to later stages, all while updating only a small parameter subset . The method achieves state-of-the-art results with about 1% trainable parameters across OPV2V, DAIR-V2X, and V2XSet, and exhibits robustness to localization noise, backed by theoretical guarantees such as for coverage. These contributions offer a data- and computation-efficient pathway for reliable multi-agent perception in diverse environments, enabling practical deployment in real-world V2X systems.

Abstract

Fast domain adaptation remains a fundamental challenge for deploying multi-agent systems across diverse environments in Vehicle-to-Everything (V2X) collaborative perception. Despite the success of Parameter-Efficient Fine-Tuning (PEFT) in natural language processing and conventional vision tasks, directly applying PEFT to multi-agent settings leads to significant performance degradation and training instability. In this work, we conduct a detailed analysis and identify two key factors: (i) inter-frame redundancy in heterogeneous sensory streams, and (ii) erosion of fine-grained semantics in deep-layer representations under PEFT adaptation. To address these issues, we propose FlowAdapt, a parameter-efficient framework grounded in optimal transport theory, which minimizes information transport costs across both data distributions and network hierarchies. Specifically, we introduce a Wasserstein Greedy Sampling strategy to selectively filter redundant samples via a bounded covering radius. Furthermore, Progressive Knowledge Transfer module is designed to progressively inject compressed early-stage representations into later stages through learnable pathways, alleviating semantic degradation in late-stage adaptation. Extensive experiments on three benchmarks demonstrate that FlowAdapt achieves state-of-the-art performance with only 1% of trainable parameters, effectively bridging domain gaps with superior sample efficiency and generalization.
Paper Structure (18 sections, 12 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 18 sections, 12 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: FlowAdapt tackles dual challenges in collaborative perception adaptation: eliminating redundant spatio-temporal samples through Wasserstein greedy sampling, and bridging isolated stage computations through progressive knowledge distillation.
  • Figure 2: Left: Performance versus temporal stride (interval between consecutively selected frames) with fixed 10% sampling ratio. Right: Performance vs. sampling ratio with sequential selection. Performance saturates beyond 60% (orange-shaded zone).
  • Figure 3: We extract feature visualizations at three network depths during adaptation: shallow (after voxelization), middle (backbone intermediate), and deep (before detection head). (a-b) show sparse activation in middle and deep layers. (c-d) show middle and deep layer visualizations after injecting compressed shallow features.
  • Figure 4: Overview of FlowAdapt. Wasserstein Greedy Sampling selects representative samples by minimizing coverage radius in spatio-temporal feature space. Progressive Knowledge Transfer operates across network stages: Collaborative Agent Prompts and Dual-Path Adapters enhance feature representations through intra-group aggregation and complementary spatial-channel processing, while the compression-injection mechanism transfers early-stage knowledge to later stages to mitigate semantic erosion.
  • Figure 5: Performance comparison under different localization noise levels when adapting from OPV2V to DAIR-V2X.
  • ...and 2 more figures