SkipSR: Faster Super Resolution with Token Skipping
Rohan Choudhury, Shanchuan Lin, Jianyi Wang, Hao Chen, Qi Zhao, Feng Cheng, Lu Jiang, Kris Kitani, Laszlo A. Jeni
TL;DR
SkipSR tackles the scalability challenge of diffusion-based video SR by predicting and skipping low-detail patches from the low-resolution input, routing only the complex patches through the Transformer and combining results with fast upsampling. The method leverages a lightweight mask predictor in the latent space and RoPE-adapted position handling to maintain consistency when some patches bypass the transformer, enabling substantial wall-clock speedups without perceptual quality loss. It unifies a skip-aware diffusion pathway with training strategies including one-step distillation and adversarial post-training, and demonstrates up to 60% reductions in end-to-end latency on 720p video SR (and up to 70% diffusion-time reductions on 1080p) while matching SeedVR/SeedVR2 quality on real-world and AI-generated data. The work offers a practical approach to accelerate high-resolution video SR, making diffusion-based restoration and generation more scalable for longer sequences and higher resolutions, with clear speed-quality tradeoffs governed by a tunable threshold $\tau$.
Abstract
Diffusion-based super-resolution (SR) is a key component in video generation and video restoration, but is slow and expensive, limiting scalability to higher resolutions and longer videos. Our key insight is that many regions in video are inherently low-detail and gain little from refinement, yet current methods process all pixels uniformly. To take advantage of this, we propose SkipSR, a simple framework for accelerating video SR by identifying low-detail regions directly from low-resolution input, then skipping computation on them entirely, only super-resolving the areas that require refinement. This simple yet effective strategy preserves perceptual quality in both standard and one-step diffusion SR models while significantly reducing computation. In standard SR benchmarks, our method achieves up to 60% faster end-to-end latency than prior models on 720p videos with no perceptible loss in quality. Video demos are available at https://rccchoudhury.github.io/skipsr/
