Table of Contents
Fetching ...

MambaLiteSR: Image Super-Resolution with Low-Rank Mamba using Knowledge Distillation

Romina Aalishah, Mozhgan Navardi, Tinoosh Mohsenin

TL;DR

This work addresses the challenge of deploying image super-resolution on resource-constrained edge devices by proposing MambaLiteSR, a lightweight Vision Mamba-based model augmented with low-rank Mamba and knowledge distillation from a larger teacher. The method optimizes embedding dimension, employs a low-rank factorization to reduce computations, and trains a compact student model that closely matches a stronger teacher’s SR performance. Experimental results demonstrate a 15% parameter reduction with competitive PSNR/SSIM and up to 58% power savings, plus significant training energy reductions via low-rank design, validated on NVIDIA Jetson Orin Nano. The approach offers a practical pathway for real-time, energy-efficient SR on edge hardware while maintaining accuracy comparable to state-of-the-art edge models.

Abstract

Generative Artificial Intelligence (AI) has gained significant attention in recent years, revolutionizing various applications across industries. Among these, advanced vision models for image super-resolution are in high demand, particularly for deployment on edge devices where real-time processing is crucial. However, deploying such models on edge devices is challenging due to limited computing power and memory. In this paper, we present MambaLiteSR, a novel lightweight image Super-Resolution (SR) model that utilizes the architecture of Vision Mamba. It integrates State Space Blocks and a reconstruction module for efficient feature extraction. To optimize efficiency without affecting performance, MambaLiteSR employs knowledge distillation to transfer key insights from a larger Mamba-based teacher model to a smaller student model via hyperparameter tuning. Through mathematical analysis of model parameters and their impact on PSNR, we identify key factors and adjust them accordingly. Our comprehensive evaluation shows that MambaLiteSR outperforms state-of-the-art edge SR methods by reducing power consumption while maintaining competitive PSNR and SSIM scores across benchmark datasets. It also reduces power usage during training via low-rank approximation. Moreover, MambaLiteSR reduces parameters with minimal performance loss, enabling efficient deployment of generative AI models on resource-constrained devices. Deployment on the embedded NVIDIA Jetson Orin Nano confirms the superior balance of MambaLiteSR size, latency, and efficiency. Experiments show that MambaLiteSR achieves performance comparable to both the baseline and other edge models while using 15% fewer parameters. It also improves power consumption by up to 58% compared to state-of-the-art SR edge models, all while maintaining low energy use during training.

MambaLiteSR: Image Super-Resolution with Low-Rank Mamba using Knowledge Distillation

TL;DR

This work addresses the challenge of deploying image super-resolution on resource-constrained edge devices by proposing MambaLiteSR, a lightweight Vision Mamba-based model augmented with low-rank Mamba and knowledge distillation from a larger teacher. The method optimizes embedding dimension, employs a low-rank factorization to reduce computations, and trains a compact student model that closely matches a stronger teacher’s SR performance. Experimental results demonstrate a 15% parameter reduction with competitive PSNR/SSIM and up to 58% power savings, plus significant training energy reductions via low-rank design, validated on NVIDIA Jetson Orin Nano. The approach offers a practical pathway for real-time, energy-efficient SR on edge hardware while maintaining accuracy comparable to state-of-the-art edge models.

Abstract

Generative Artificial Intelligence (AI) has gained significant attention in recent years, revolutionizing various applications across industries. Among these, advanced vision models for image super-resolution are in high demand, particularly for deployment on edge devices where real-time processing is crucial. However, deploying such models on edge devices is challenging due to limited computing power and memory. In this paper, we present MambaLiteSR, a novel lightweight image Super-Resolution (SR) model that utilizes the architecture of Vision Mamba. It integrates State Space Blocks and a reconstruction module for efficient feature extraction. To optimize efficiency without affecting performance, MambaLiteSR employs knowledge distillation to transfer key insights from a larger Mamba-based teacher model to a smaller student model via hyperparameter tuning. Through mathematical analysis of model parameters and their impact on PSNR, we identify key factors and adjust them accordingly. Our comprehensive evaluation shows that MambaLiteSR outperforms state-of-the-art edge SR methods by reducing power consumption while maintaining competitive PSNR and SSIM scores across benchmark datasets. It also reduces power usage during training via low-rank approximation. Moreover, MambaLiteSR reduces parameters with minimal performance loss, enabling efficient deployment of generative AI models on resource-constrained devices. Deployment on the embedded NVIDIA Jetson Orin Nano confirms the superior balance of MambaLiteSR size, latency, and efficiency. Experiments show that MambaLiteSR achieves performance comparable to both the baseline and other edge models while using 15% fewer parameters. It also improves power consumption by up to 58% compared to state-of-the-art SR edge models, all while maintaining low energy use during training.

Paper Structure

This paper contains 14 sections, 7 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Architecture of DVMSR lei2024dvmsr and Vision Mamba visionmamba: Input image is preprocessed and fed into the DVMSR model, which consists of Vision Mamba modules, convolution layers, and a decoder.
  • Figure 2: High-Level overview of MambaLiteSR process: Low-resolution input image ($64 \times 64$) is preprocessed and fed into the knowledge distillation process, generating the high-resolution output ($256 \times 256$). Weighted distillation and student losses enable the student model to learn efficiently under teacher supervision. Proper embedding dimension, which determines the feature vector size resulting from the image patches, makes it suitable for edge devices.
  • Figure 3: (a) For measuring dynamic power on the embedded NVIDIA Jetson Orin Nano, the student onnx model is converted to TensorRT format and then the measurement starts. (b) The plot shows the instantaneous power usage over time for 1000 samples, extracted using the tegrastats utility tegrastats.
  • Figure 4: Comparison between MambaLiteSR teacher model training when $rank = 2$ and $rank = 30$ over 1500 iterations: (a) depicts the validation PSNR. (b) depicts the moving average of GPU power usage every 100 seconds with a window size of 100. At runtime, the larger rank requires more FLOPs, drawing more power under load. The training outcome and model performance act similar because, as indicated by Equation \ref{['eq:rank']}, the matrix ultimately remains the same. The reported power measurements correspond to the NVIDIA GeForce RTX 4090 on Lambda GPU Server lambda using wandb wandb dashboard.
  • Figure 5: Model performance on the validations set based on the changes in $\alpha$, suggesting a potential inconsistency in gradient values between the learned teacher and the ground truth. The reported PSNRs are after 1000 iterations of training for teacher and student models.
  • ...and 1 more figures