Table of Contents
Fetching ...

Binarized Mamba-Transformer for Lightweight Quad Bayer HybridEVS Demosaicing

Shiyang Zhou, Haijin Zeng, Yunfan Lu, Tong Shao, Ke Tang, Yongyong Chen, Jie Liu, Jingyong Su

TL;DR

Quad Bayer HybridEVS demosaicing on mobile devices faces high computational costs when leveraging global dependency models. The authors introduce BMTNet, a lightweight binarized Mamba-Transformer that fuses a Bi-Mamba-Transformer core with a binarized global visual encoder, preserving Selective Scan in full precision to maintain accuracy while dramatically reducing parameters and FLOPs. The approach achieves PSNR improvements over other BNNs and competitive results against full-precision models across diverse datasets, enabling practical edge deployment for HybridEVS. This work broadens the applicability of binarized networks and space-models in vision tasks, offering a scalable solution for real-world demosaicing on resource-constrained devices.

Abstract

Quad Bayer demosaicing is the central challenge for enabling the widespread application of Hybrid Event-based Vision Sensors (HybridEVS). Although existing learning-based methods that leverage long-range dependency modeling have achieved promising results, their complexity severely limits deployment on mobile devices for real-world applications. To address these limitations, we propose a lightweight Mamba-based binary neural network designed for efficient and high-performing demosaicing of HybridEVS RAW images. First, to effectively capture both global and local dependencies, we introduce a hybrid Binarized Mamba-Transformer architecture that combines the strengths of the Mamba and Swin Transformer architectures. Next, to significantly reduce computational complexity, we propose a binarized Mamba (Bi-Mamba), which binarizes all projections while retaining the core Selective Scan in full precision. Bi-Mamba also incorporates additional global visual information to enhance global context and mitigate precision loss. We conduct quantitative and qualitative experiments to demonstrate the effectiveness of BMTNet in both performance and computational efficiency, providing a lightweight demosaicing solution suited for real-world edge devices. Our codes and models are available at https://github.com/Clausy9/BMTNet.

Binarized Mamba-Transformer for Lightweight Quad Bayer HybridEVS Demosaicing

TL;DR

Quad Bayer HybridEVS demosaicing on mobile devices faces high computational costs when leveraging global dependency models. The authors introduce BMTNet, a lightweight binarized Mamba-Transformer that fuses a Bi-Mamba-Transformer core with a binarized global visual encoder, preserving Selective Scan in full precision to maintain accuracy while dramatically reducing parameters and FLOPs. The approach achieves PSNR improvements over other BNNs and competitive results against full-precision models across diverse datasets, enabling practical edge deployment for HybridEVS. This work broadens the applicability of binarized networks and space-models in vision tasks, offering a scalable solution for real-world demosaicing on resource-constrained devices.

Abstract

Quad Bayer demosaicing is the central challenge for enabling the widespread application of Hybrid Event-based Vision Sensors (HybridEVS). Although existing learning-based methods that leverage long-range dependency modeling have achieved promising results, their complexity severely limits deployment on mobile devices for real-world applications. To address these limitations, we propose a lightweight Mamba-based binary neural network designed for efficient and high-performing demosaicing of HybridEVS RAW images. First, to effectively capture both global and local dependencies, we introduce a hybrid Binarized Mamba-Transformer architecture that combines the strengths of the Mamba and Swin Transformer architectures. Next, to significantly reduce computational complexity, we propose a binarized Mamba (Bi-Mamba), which binarizes all projections while retaining the core Selective Scan in full precision. Bi-Mamba also incorporates additional global visual information to enhance global context and mitigate precision loss. We conduct quantitative and qualitative experiments to demonstrate the effectiveness of BMTNet in both performance and computational efficiency, providing a lightweight demosaicing solution suited for real-world edge devices. Our codes and models are available at https://github.com/Clausy9/BMTNet.

Paper Structure

This paper contains 19 sections, 10 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Up-left: PSNR and parameters comparisons of our BMTNet and other BNNs on MIPI dataset. Up-right: PSNR and Parameters comparisons of our BMTNet and other FP methods on MIPI dataset. Down: CFA comparisons between Bayer, Quad Bayer, and Quad Bayer HybridEVS. Event pixel appears as a mixed color.
  • Figure 2: Overall architecture of BMTNet. A binary convolution-based simple subnetwork is initially employed for event pixel inpainting. The main branch incorporates our hybrid binary Mamba-Transformer Block, which pioneeringly integrates Bi-Mamba with Bi-Swin Transformer to capture both global and local features. An additional global visual branch is used to enhance global dependencies, with Bi-Mamba specifically handling the fusion of global features.
  • Figure 3: Model details of the bi-visual encoder and Bi-Mamba. (a) We first adopted a pretrained large visual encoder from RAM zhang2024recognize to pretrain our binarized visual encoder fit for Quad Bayer RAW input. (b) During the training of BMTNet, the binarized visual encoder is frozen and produces global visual embeddings to Bi-Mamba after an adapter. (c) In the binarized Mamba, we binarize all projections while keeping the core selective scan calculation in full precision, effectively reducing computational load while maintaining performance. To further enhance the global capacity, we introduce extra global information into the control matrix $\mathbf{B}$ of input.
  • Figure 4: Visualization on the Urban100 dataset across all compared BNN methods. The proposed BMTNet achieves the best visual quality, effectively reducing artifacts and color aliasing.
  • Figure 5: Visualized results across all compared BNN methods on the Kodak (up) and Vid4 (down) datasets, with a corresponding heatmap showing the pixel value differences. The proposed BMTNet exhibits less color aliasing than other BNN methods.
  • ...and 2 more figures