MambaX: Image Super-Resolution with State Predictive Control
Chenyu Li, Danfeng Hong, Bing Zhang, Zhaojie Pan, Naoto Yokoya, Jocelyn Chanussot
TL;DR
MambaX reframes image super-resolution as a latent state-space process and introduces nonlinear state predictive control (nSPC) to dynamically learn multistage differential coefficients. It combines a progressive cross-domain transition to align degradations with a cross-control fusion mechanism for multimodal SR, underpinned by a convergence analysis that establishes universal approximation capabilities for the resulting state-space models. Empirically, MambaX delivers state-of-the-art or near state-of-the-art performance on computer vision and remote-sensing SR tasks across single- and multimodal settings, demonstrating robustness to domain shifts and modality gaps. The approach advances spectrally generalized SR by enabling adaptive, learnable control matrices within a stable state-space framework, with potential for further enhancement via uncertainty-aware extensions.
Abstract
Image super-resolution (SR) is a critical technology for overcoming the inherent hardware limitations of sensors. However, existing approaches mainly focus on directly enhancing the final resolution, often neglecting effective control over error propagation and accumulation during intermediate stages. Recently, Mamba has emerged as a promising approach that can represent the entire reconstruction process as a state sequence with multiple nodes, allowing for intermediate intervention. Nonetheless, its fixed linear mapper is limited by a narrow receptive field and restricted flexibility, which hampers its effectiveness in fine-grained images. To address this, we created a nonlinear state predictive control model \textbf{MambaX} that maps consecutive spectral bands into a latent state space and generalizes the SR task by dynamically learning the nonlinear state parameters of control equations. Compared to existing sequence models, MambaX 1) employs dynamic state predictive control learning to approximate the nonlinear differential coefficients of state-space models; 2) introduces a novel state cross-control paradigm for multimodal SR fusion; and 3) utilizes progressive transitional learning to mitigate heterogeneity caused by domain and modality shifts. Our evaluation demonstrates the superior performance of the dynamic spectrum-state representation model in both single-image SR and multimodal fusion-based SR tasks, highlighting its substantial potential to advance spectrally generalized modeling across arbitrary dimensions and modalities.
