Table of Contents
Fetching ...

MAMAT: 3D Mamba-Based Atmospheric Turbulence Removal and its Object Detection Capability

Paul Hill, Zhiming Liu, Nantheera Anantrasirichai

TL;DR

This work addresses the challenging problem of atmospheric turbulence in long-range video by introducing MAMAT, a dual-module framework that combines deformable 3D registration (SDAT) with a 3D Mamba-based enhancement module (EDP). The SDAT module performs non-rigid frame alignment using deformable 3D convolutions across multiple scales, while EDP leverages a UNet-like 3D Mamba backbone with a Selective State Space Model to enhance contrast and texture over time. The approach yields superior restoration quality (PSNR/SSIM) and significantly improves object detection across multiple detectors, including notable gains for small objects, bridging visual restoration and automated surveillance effectiveness. These results demonstrate the practical impact of integrating advanced spatiotemporal modeling (3D Mamba/SSM) into turbulence mitigation, enabling better visualization and more reliable detection in challenging atmospheric conditions.

Abstract

Restoration and enhancement are essential for improving the quality of videos captured under atmospheric turbulence conditions, aiding visualization, object detection, classification, and tracking in surveillance systems. In this paper, we introduce a novel Mamba-based method, the 3D Mamba-Based Atmospheric Turbulence Removal (MAMAT), which employs a dual-module strategy to mitigate these distortions. The first module utilizes deformable 3D convolutions for non-rigid registration to minimize spatial shifts, while the second module enhances contrast and detail. Leveraging the advanced capabilities of the 3D Mamba architecture, experimental results demonstrate that MAMAT outperforms state-of-the-art learning-based methods, achieving up to a 3\% improvement in visual quality and a 15\% boost in object detection. It not only enhances visualization but also significantly improves object detection accuracy, bridging the gap between visual restoration and the effectiveness of surveillance applications.

MAMAT: 3D Mamba-Based Atmospheric Turbulence Removal and its Object Detection Capability

TL;DR

This work addresses the challenging problem of atmospheric turbulence in long-range video by introducing MAMAT, a dual-module framework that combines deformable 3D registration (SDAT) with a 3D Mamba-based enhancement module (EDP). The SDAT module performs non-rigid frame alignment using deformable 3D convolutions across multiple scales, while EDP leverages a UNet-like 3D Mamba backbone with a Selective State Space Model to enhance contrast and texture over time. The approach yields superior restoration quality (PSNR/SSIM) and significantly improves object detection across multiple detectors, including notable gains for small objects, bridging visual restoration and automated surveillance effectiveness. These results demonstrate the practical impact of integrating advanced spatiotemporal modeling (3D Mamba/SSM) into turbulence mitigation, enabling better visualization and more reliable detection in challenging atmospheric conditions.

Abstract

Restoration and enhancement are essential for improving the quality of videos captured under atmospheric turbulence conditions, aiding visualization, object detection, classification, and tracking in surveillance systems. In this paper, we introduce a novel Mamba-based method, the 3D Mamba-Based Atmospheric Turbulence Removal (MAMAT), which employs a dual-module strategy to mitigate these distortions. The first module utilizes deformable 3D convolutions for non-rigid registration to minimize spatial shifts, while the second module enhances contrast and detail. Leveraging the advanced capabilities of the 3D Mamba architecture, experimental results demonstrate that MAMAT outperforms state-of-the-art learning-based methods, achieving up to a 3\% improvement in visual quality and a 15\% boost in object detection. It not only enhances visualization but also significantly improves object detection accuracy, bridging the gap between visual restoration and the effectiveness of surveillance applications.

Paper Structure

This paper contains 19 sections, 5 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Diagram of the proposed MAMAT framework
  • Figure 2: Subjective results comparing (from left to right) the distorted frame, the restored frames using TMT, DATUM, and our MAMAT, as well as the clean frame used as ground truth.
  • Figure 3: Subjective results of applying the YOLOv11 large model to the outputs of different restoration methods, as well as to distorted and clean videos (ground truth).