Table of Contents
Fetching ...

DLRMamba: Distilling Low-Rank Mamba for Edge Multispectral Fusion Object Detection

Qianqian Zhang, Leon Tabaro, Ahmed M. Abdelmoniem, Junshe An

TL;DR

The Low-Rank Two-Dimensional Selective Structured State Space Model (Low-Rank SS2D), which reformulates state transitions via matrix factorization to exploit intrinsic feature sparsity, and introduces a Structure-Aware Distillation strategy that aligns the internal latent state dynamics of the student with a full-rank teacher model to compensate for potential representation degradation.

Abstract

Multispectral fusion object detection is a critical task for edge-based maritime surveillance and remote sensing, demanding both high inference efficiency and robust feature representation for high-resolution inputs. However, current State Space Models (SSMs) like Mamba suffer from significant parameter redundancy in their standard 2D Selective Scan (SS2D) blocks, which hinders deployment on resource-constrained hardware and leads to the loss of fine-grained structural information during conventional compression. To address these challenges, we propose the Low-Rank Two-Dimensional Selective Structured State Space Model (Low-Rank SS2D), which reformulates state transitions via matrix factorization to exploit intrinsic feature sparsity. Furthermore, we introduce a Structure-Aware Distillation strategy that aligns the internal latent state dynamics of the student with a full-rank teacher model to compensate for potential representation degradation. This approach substantially reduces computational complexity and memory footprint while preserving the high-fidelity spatial modeling required for object recognition. Extensive experiments on five benchmark datasets and real-world edge platforms, such as Raspberry Pi 5, demonstrate that our method achieves a superior efficiency-accuracy trade-off, significantly outperforming existing lightweight architectures in practical deployment scenarios.

DLRMamba: Distilling Low-Rank Mamba for Edge Multispectral Fusion Object Detection

TL;DR

The Low-Rank Two-Dimensional Selective Structured State Space Model (Low-Rank SS2D), which reformulates state transitions via matrix factorization to exploit intrinsic feature sparsity, and introduces a Structure-Aware Distillation strategy that aligns the internal latent state dynamics of the student with a full-rank teacher model to compensate for potential representation degradation.

Abstract

Multispectral fusion object detection is a critical task for edge-based maritime surveillance and remote sensing, demanding both high inference efficiency and robust feature representation for high-resolution inputs. However, current State Space Models (SSMs) like Mamba suffer from significant parameter redundancy in their standard 2D Selective Scan (SS2D) blocks, which hinders deployment on resource-constrained hardware and leads to the loss of fine-grained structural information during conventional compression. To address these challenges, we propose the Low-Rank Two-Dimensional Selective Structured State Space Model (Low-Rank SS2D), which reformulates state transitions via matrix factorization to exploit intrinsic feature sparsity. Furthermore, we introduce a Structure-Aware Distillation strategy that aligns the internal latent state dynamics of the student with a full-rank teacher model to compensate for potential representation degradation. This approach substantially reduces computational complexity and memory footprint while preserving the high-fidelity spatial modeling required for object recognition. Extensive experiments on five benchmark datasets and real-world edge platforms, such as Raspberry Pi 5, demonstrate that our method achieves a superior efficiency-accuracy trade-off, significantly outperforming existing lightweight architectures in practical deployment scenarios.
Paper Structure (32 sections, 12 equations, 7 figures, 7 tables)

This paper contains 32 sections, 12 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Structural comparison of full-rank vs. low-rank SS2D. By significantly reducing computational overhead while maintaining representative power, the low-rank design opens up new avenues for efficient vision computing on resource-constrained edge devices.
  • Figure 2: Overview of the VMamba backbone and SS2D module.
  • Figure 3: Overview of the proposed DLRMamba framework. The proposed framework consists of four core components: (1) A pixel-level multispectral modality fusion module, which is designed to effectively fuse and process visible and infrared spectral information; (2) Low-Rank Structured State Space Modeling (Low-Rank SS2D), which is integrated to realize model lightweighting; (3) A structure-aware distillation (SAD) mechanism, including Singular Value Decomposition (SVD) Alignment (Matrix-level Distillation), Hidden State Sequence Alignment (Dynamic Distillation), and Feature Reconstruction (Output-level Distillation), which is proposed to compensate for performance degradation induced by model compression; (4) A detection head, which is used to output the final detection results.
  • Figure 4: Sample RGB–IR pairs from five datasets (top: RGB; bottom: IR).
  • Figure 5: Visual comparison of detection results produced by our and competing approaches on the VEDAI dataset under various challenging scenarios. Subfigures (a) and (d) illustrate detection performance in the presence of tree occlusions. Subfigure (b) presents results in an extremely dense scene containing numerous objects of different scales and categories. Subfigures (c) and (e) demonstrate cases where background objects exhibit high visual similarity to the objects. Red circles denote misclassified objects (incorrect category prediction), blue circles indicate false positives (detections of non-existent objects), and yellow circles represent missed detections of ground-truth objects.
  • ...and 2 more figures