Table of Contents
Fetching ...

MambaX: Image Super-Resolution with State Predictive Control

Chenyu Li, Danfeng Hong, Bing Zhang, Zhaojie Pan, Naoto Yokoya, Jocelyn Chanussot

TL;DR

MambaX reframes image super-resolution as a latent state-space process and introduces nonlinear state predictive control (nSPC) to dynamically learn multistage differential coefficients. It combines a progressive cross-domain transition to align degradations with a cross-control fusion mechanism for multimodal SR, underpinned by a convergence analysis that establishes universal approximation capabilities for the resulting state-space models. Empirically, MambaX delivers state-of-the-art or near state-of-the-art performance on computer vision and remote-sensing SR tasks across single- and multimodal settings, demonstrating robustness to domain shifts and modality gaps. The approach advances spectrally generalized SR by enabling adaptive, learnable control matrices within a stable state-space framework, with potential for further enhancement via uncertainty-aware extensions.

Abstract

Image super-resolution (SR) is a critical technology for overcoming the inherent hardware limitations of sensors. However, existing approaches mainly focus on directly enhancing the final resolution, often neglecting effective control over error propagation and accumulation during intermediate stages. Recently, Mamba has emerged as a promising approach that can represent the entire reconstruction process as a state sequence with multiple nodes, allowing for intermediate intervention. Nonetheless, its fixed linear mapper is limited by a narrow receptive field and restricted flexibility, which hampers its effectiveness in fine-grained images. To address this, we created a nonlinear state predictive control model \textbf{MambaX} that maps consecutive spectral bands into a latent state space and generalizes the SR task by dynamically learning the nonlinear state parameters of control equations. Compared to existing sequence models, MambaX 1) employs dynamic state predictive control learning to approximate the nonlinear differential coefficients of state-space models; 2) introduces a novel state cross-control paradigm for multimodal SR fusion; and 3) utilizes progressive transitional learning to mitigate heterogeneity caused by domain and modality shifts. Our evaluation demonstrates the superior performance of the dynamic spectrum-state representation model in both single-image SR and multimodal fusion-based SR tasks, highlighting its substantial potential to advance spectrally generalized modeling across arbitrary dimensions and modalities.

MambaX: Image Super-Resolution with State Predictive Control

TL;DR

MambaX reframes image super-resolution as a latent state-space process and introduces nonlinear state predictive control (nSPC) to dynamically learn multistage differential coefficients. It combines a progressive cross-domain transition to align degradations with a cross-control fusion mechanism for multimodal SR, underpinned by a convergence analysis that establishes universal approximation capabilities for the resulting state-space models. Empirically, MambaX delivers state-of-the-art or near state-of-the-art performance on computer vision and remote-sensing SR tasks across single- and multimodal settings, demonstrating robustness to domain shifts and modality gaps. The approach advances spectrally generalized SR by enabling adaptive, learnable control matrices within a stable state-space framework, with potential for further enhancement via uncertainty-aware extensions.

Abstract

Image super-resolution (SR) is a critical technology for overcoming the inherent hardware limitations of sensors. However, existing approaches mainly focus on directly enhancing the final resolution, often neglecting effective control over error propagation and accumulation during intermediate stages. Recently, Mamba has emerged as a promising approach that can represent the entire reconstruction process as a state sequence with multiple nodes, allowing for intermediate intervention. Nonetheless, its fixed linear mapper is limited by a narrow receptive field and restricted flexibility, which hampers its effectiveness in fine-grained images. To address this, we created a nonlinear state predictive control model \textbf{MambaX} that maps consecutive spectral bands into a latent state space and generalizes the SR task by dynamically learning the nonlinear state parameters of control equations. Compared to existing sequence models, MambaX 1) employs dynamic state predictive control learning to approximate the nonlinear differential coefficients of state-space models; 2) introduces a novel state cross-control paradigm for multimodal SR fusion; and 3) utilizes progressive transitional learning to mitigate heterogeneity caused by domain and modality shifts. Our evaluation demonstrates the superior performance of the dynamic spectrum-state representation model in both single-image SR and multimodal fusion-based SR tasks, highlighting its substantial potential to advance spectrally generalized modeling across arbitrary dimensions and modalities.

Paper Structure

This paper contains 27 sections, 3 theorems, 29 equations, 11 figures, 4 tables.

Key Result

Lemma 2.1

For any given continuous function $f$ over a compact set $K$. Then there exists a sequence of state-space models that can approximate the sequence relationship $f:(u_1,\cdots,u_T)\to(f(u_1),\cdots,f(u_T))$.

Figures (11)

  • Figure 1: The Comparison of linear and nonlinear control matrix operators. (a) and (b) are the input low-resolution image and local enlargement of GT. (c) and (e) illustrate the effective receptive field at the center point of the red box, highlighting that the dynamic approach provides a broader receptive field. (d) and (f) further compare the intermediate feature maps of both methods, demonstrating that the dynamic method produces feature maps with finer details, primarily due to its larger local detail receptive field. (g) shows the mean squared error (MSE) results of two local enlargements.
  • Figure 2: An illustrative workflow of the proposed MambaX, whose each state $t$ is unfolded into cross-domain transition and nonlinear state predictive control (nSPC). (a) the cross-domain transition process, highlighting degradation alignment and domain transfer. (b) provides a detailed view of the block structure in MambaX. (c) formulates the nSPC.
  • Figure 3: Visual quality comparison on CAVE dataset. The first row shows the pseudo-color image (50,40,30) of the 8x SR result. The second row shows the MSE between the 8x SR result and the ground truth. The last row shows the local enlargement of the MSE. The bar chart on the right displays the RMSE values for each algorithm.
  • Figure 4: Visual quality comparison on Chikusei dataset. The first and second rows show the pseudo-color image (50,40,30) of the x8 and x2 SR results, respectively. The third row shows the local enlargement of mean squared error (MSE) results between the x2 SR results and ground truth.
  • Figure 5: Visual quality comparison on WV3 dataset. The first row shows the true color images of different results, and the second row shows the local enlargement of the mean squared error (MSE) results and the fourth channel result.
  • ...and 6 more figures

Theorems & Definitions (3)

  • Lemma 2.1
  • Theorem 2.2
  • proposition 2.3