Implicit Neural Representation for Video and Image Super-Resolution
Mary Aiyetigbo, Wanqi Yuan, Feng Luo, Nianyi Li
TL;DR
SR-INR introduces a unified implicit neural representation for both image and video super-resolution, reconstructing $I_{hr}$ from $I_{lr}$ via a high-resolution grid $\oldsymbol{\mathcal{G}}_{hr}$ and multi-resolution hash-encoded texture features. By combining texture encoding, implicit hashing over a 6D latent space, and a top-down attention mechanism, the method captures spatial and temporal details without explicit motion estimation. A Pixel-Error Amplified Loss (PEA-loss) further refines fine details while mitigating over-smoothing. Across image and video benchmarks, SR-INR delivers competitive or superior results with a simpler, more efficient architecture, and ablations reveal favorable trade-offs guiding architectural choices. This unified INR-based approach highlights the potential of grid-based, hash-encoded representations for scalable, temporally stable SR in real-world applications.
Abstract
We present a novel approach for super-resolution that utilizes implicit neural representation (INR) to effectively reconstruct and enhance low-resolution videos and images. By leveraging the capacity of neural networks to implicitly encode spatial and temporal features, our method facilitates high-resolution reconstruction using only low-resolution inputs and a 3D high-resolution grid. This results in an efficient solution for both image and video super-resolution. Our proposed method, SR-INR, maintains consistent details across frames and images, achieving impressive temporal stability without relying on the computationally intensive optical flow or motion estimation typically used in other video super-resolution techniques. The simplicity of our approach contrasts with the complexity of many existing methods, making it both effective and efficient. Experimental evaluations show that SR-INR delivers results on par with or superior to state-of-the-art super-resolution methods, while maintaining a more straightforward structure and reduced computational demands. These findings highlight the potential of implicit neural representations as a powerful tool for reconstructing high-quality, temporally consistent video and image signals from low-resolution data.
