Scan Clusters, Not Pixels: A Cluster-Centric Paradigm for Efficient Ultra-high-definition Image Restoration

Chen Wu; Ling Wang; Zhuoran Zheng; Yuning Cui; Zhixiong Yang; Xiangyu Chen; Yue Zhang; Weidong Jiang; Jingyuan Xia

Scan Clusters, Not Pixels: A Cluster-Centric Paradigm for Efficient Ultra-high-definition Image Restoration

Chen Wu, Ling Wang, Zhuoran Zheng, Yuning Cui, Zhixiong Yang, Xiangyu Chen, Yue Zhang, Weidong Jiang, Jingyuan Xia

TL;DR

This paper introduces C$^2SSM, a visual state space model that breaks this taboo by shifting from pixel-serial to cluster-serial scanning and charts a new course for efficient large-scale vision: scan clusters, not pixels.

Abstract

Ultra-High-Definition (UHD) image restoration is trapped in a scalability crisis: existing models, bound to pixel-wise operations, demand unsustainable computation. While state space models (SSMs) like Mamba promise linear complexity, their pixel-serial scanning remains a fundamental bottleneck for the millions of pixels in UHD content. We ask: must we process every pixel to understand the image? This paper introduces C$^2$SSM, a visual state space model that breaks this taboo by shifting from pixel-serial to cluster-serial scanning. Our core discovery is that the rich feature distribution of a UHD image can be distilled into a sparse set of semantic centroids via a neural-parameterized mixture model. C$^2$SSM leverages this to reformulate global modeling into a novel dual-path process: it scans and reasons over a handful of cluster centers, then diffuses the global context back to all pixels through a principled similarity distribution, all while a lightweight modulator preserves fine details. This cluster-centric paradigm achieves a decisive leap in efficiency, slashing computational costs while establishing new state-of-the-art results across five UHD restoration tasks. More than a solution, C$^2$SSM charts a new course for efficient large-scale vision: scan clusters, not pixels.

Scan Clusters, Not Pixels: A Cluster-Centric Paradigm for Efficient Ultra-high-definition Image Restoration

TL;DR

Abstract

SSM, a visual state space model that breaks this taboo by shifting from pixel-serial to cluster-serial scanning. Our core discovery is that the rich feature distribution of a UHD image can be distilled into a sparse set of semantic centroids via a neural-parameterized mixture model. C

SSM leverages this to reformulate global modeling into a novel dual-path process: it scans and reasons over a handful of cluster centers, then diffuses the global context back to all pixels through a principled similarity distribution, all while a lightweight modulator preserves fine details. This cluster-centric paradigm achieves a decisive leap in efficiency, slashing computational costs while establishing new state-of-the-art results across five UHD restoration tasks. More than a solution, C

SSM charts a new course for efficient large-scale vision: scan clusters, not pixels.

Paper Structure (15 sections, 9 equations, 5 figures, 10 tables)

This paper contains 15 sections, 9 equations, 5 figures, 10 tables.

Introduction
Related Work
State Space Model in Image Restoration
UHD Image Restoration
Methodology
Overall Architecture
Cluster-Centric Scanning Module
Feature Aggregating
Score Diffusing
Spatial-Channel Feature Modulator
Experiments
Experimental Settings
Comparisons with the State-of-the-art Methods
Ablation Studies and Discussions
Conclusion

Figures (5)

Figure 1: The scanning strategies in existing Mamba-based methods and our proposed method. (a) Vmamba Vmamba employs a Z-shaped scan path that incurs VRAM bottlenecks when processing UHD images due to its full-pixel scanning. (b) EfficientVMamba efficientvmamba reduces scanning costs by omitting sampling steps, this compromises global modeling accuracy. (c) The proposed cluster-centric scanning strategy.
Figure 2: The overview of our proposed C$^2$SSM. C$^2$SSM employs an asymmetric U-Net architecture whose decoder integrates the Cluster-Centric Scanning Module and Spatial-Channel Feature Modulator to achieve spatial-channel global feature coupling.
Figure 3: Visual quality comparisons on UHD-LOL4K dataset LLFormer. The last row shows the color histogram of the image.
Figure 4: Visual quality comparisons on 4K-Rain13k dataset UDR-Mixer. The last row shows the error map of the image.
Figure 5: Visual quality comparisons on UHD-Blur dataset UHDformer. The last row shows the color histogram of the image.

Scan Clusters, Not Pixels: A Cluster-Centric Paradigm for Efficient Ultra-high-definition Image Restoration

TL;DR

Abstract

Scan Clusters, Not Pixels: A Cluster-Centric Paradigm for Efficient Ultra-high-definition Image Restoration

Authors

TL;DR

Abstract

Table of Contents

Figures (5)