Table of Contents
Fetching ...

AlterMOMA: Fusion Redundancy Pruning for Camera-LiDAR Fusion Models with Alternative Modality Masking

Shiqi Sun, Yantao Lu, Ning Liu, Bo Jiang, JinChao Chen, Ying Zhang

TL;DR

AlterMOMA tackles fusion-induced feature redundancy when pruning camera-LiDAR fusion models by introducing Alternative Modality Masking and Redundancy Reactivation, coupled with AlterEva to score parameter importance. The method alternates masking between camera and LiDAR backbones, uses DeCI and ReRI indicators based on Taylor-approximate loss changes to compute per-parameter scores, and prunes using a global threshold before fine-tuning. Extensive experiments on nuScenes and KITTI across BEV fusion architectures demonstrate state-of-the-art pruning performance, outperforming single-modal pruning baselines in both unstructured and structured pruning settings. The work offers a practical route to compress multi-sensor fusion models without sacrificing accuracy and provides a framework with potential extensions to other multi-modal fusion systems, while highlighting sensitivity to hyperparameters and domain-specific considerations.

Abstract

Camera-LiDAR fusion models significantly enhance perception performance in autonomous driving. The fusion mechanism leverages the strengths of each modality while minimizing their weaknesses. Moreover, in practice, camera-LiDAR fusion models utilize pre-trained backbones for efficient training. However, we argue that directly loading single-modal pre-trained camera and LiDAR backbones into camera-LiDAR fusion models introduces similar feature redundancy across modalities due to the nature of the fusion mechanism. Unfortunately, existing pruning methods are developed explicitly for single-modal models, and thus, they struggle to effectively identify these specific redundant parameters in camera-LiDAR fusion models. In this paper, to address the issue above on camera-LiDAR fusion models, we propose a novelty pruning framework Alternative Modality Masking Pruning (AlterMOMA), which employs alternative masking on each modality and identifies the redundant parameters. Specifically, when one modality parameters are masked (deactivated), the absence of features from the masked backbone compels the model to reactivate previous redundant features of the other modality backbone. Therefore, these redundant features and relevant redundant parameters can be identified via the reactivation process. The redundant parameters can be pruned by our proposed importance score evaluation function, Alternative Evaluation (AlterEva), which is based on the observation of the loss changes when certain modality parameters are activated and deactivated. Extensive experiments on the nuScene and KITTI datasets encompassing diverse tasks, baseline models, and pruning algorithms showcase that AlterMOMA outperforms existing pruning methods, attaining state-of-the-art performance.

AlterMOMA: Fusion Redundancy Pruning for Camera-LiDAR Fusion Models with Alternative Modality Masking

TL;DR

AlterMOMA tackles fusion-induced feature redundancy when pruning camera-LiDAR fusion models by introducing Alternative Modality Masking and Redundancy Reactivation, coupled with AlterEva to score parameter importance. The method alternates masking between camera and LiDAR backbones, uses DeCI and ReRI indicators based on Taylor-approximate loss changes to compute per-parameter scores, and prunes using a global threshold before fine-tuning. Extensive experiments on nuScenes and KITTI across BEV fusion architectures demonstrate state-of-the-art pruning performance, outperforming single-modal pruning baselines in both unstructured and structured pruning settings. The work offers a practical route to compress multi-sensor fusion models without sacrificing accuracy and provides a framework with potential extensions to other multi-modal fusion systems, while highlighting sensitivity to hyperparameters and domain-specific considerations.

Abstract

Camera-LiDAR fusion models significantly enhance perception performance in autonomous driving. The fusion mechanism leverages the strengths of each modality while minimizing their weaknesses. Moreover, in practice, camera-LiDAR fusion models utilize pre-trained backbones for efficient training. However, we argue that directly loading single-modal pre-trained camera and LiDAR backbones into camera-LiDAR fusion models introduces similar feature redundancy across modalities due to the nature of the fusion mechanism. Unfortunately, existing pruning methods are developed explicitly for single-modal models, and thus, they struggle to effectively identify these specific redundant parameters in camera-LiDAR fusion models. In this paper, to address the issue above on camera-LiDAR fusion models, we propose a novelty pruning framework Alternative Modality Masking Pruning (AlterMOMA), which employs alternative masking on each modality and identifies the redundant parameters. Specifically, when one modality parameters are masked (deactivated), the absence of features from the masked backbone compels the model to reactivate previous redundant features of the other modality backbone. Therefore, these redundant features and relevant redundant parameters can be identified via the reactivation process. The redundant parameters can be pruned by our proposed importance score evaluation function, Alternative Evaluation (AlterEva), which is based on the observation of the loss changes when certain modality parameters are activated and deactivated. Extensive experiments on the nuScene and KITTI datasets encompassing diverse tasks, baseline models, and pruning algorithms showcase that AlterMOMA outperforms existing pruning methods, attaining state-of-the-art performance.
Paper Structure (19 sections, 1 theorem, 32 equations, 3 figures, 6 tables, 1 algorithm)

This paper contains 19 sections, 1 theorem, 32 equations, 3 figures, 6 tables, 1 algorithm.

Key Result

Proposition 1

For a camera-LiDAR fusion model with parameters $\theta_c$ for the camera backbone and parameters $\theta_l$ for the LiDAR backbone, we can mask one of the backbones using masks $\mu_l = 0$ for the LiDAR backbone and $\mu_c = 0$ for the camera backbone. Take the models with masking LiDAR backbone as

Figures (3)

  • Figure 1: Motivating example of fusion-redundant features in the 3D object detection task. We employ backward propagation on camera-LiDAR fusion models with pre-trained backbones to observe the gradient difference (features utilization) between with camera backbone only and with both the camera and LiDAR backbone. Notably, certain pre-trained parameters in the camera backbone are redundant due to the amendment of LiDAR information. It reveals that similar feature extraction exists across modalities, which introduces additional redundancy when camera-LiDAR fusion models directly loads single-modal pre-trained backbones.
  • Figure 2: Overview of the AlterMOMA : The framework begins with Modality Masking, where one of the backbones is initially masked. This step is followed by Redundancy Reactivation and Importance Evaluation, where the parameter importance scores are initially calculated with AlterEva. Afterward, the models undergo Reinitialization and Alternative Masking of the other backbone, leading to another round of Redundancy Reactivation and Importance Evaluation. When scores of all parameters in backbones are calculated fully with AlterEva (detailed in Section \ref{['sec:importance']}), models are pruned to remove parameters with low importance scores and then finetuned. Notably, we use black lines to represent parameters of models and red lines to represent reactivated fusion-redundant parameters. The thickness of these lines indicates the contribution of parameters.
  • Figure 3: Ablation Study of hyperparameters $\alpha$ and $\beta$ on the nuScene validation dataset. We list the relationship between mAP and $\beta/\alpha$ with our approaches within 80%, 85%, and 90% pruning ratios. The two baseline models, BEVfusion-mit and BEVfusion-pku are trained with SwinT and VoxelNet backbone.

Theorems & Definitions (2)

  • Proposition
  • proof