Table of Contents
Fetching ...

Exploring Real-Time Super-Resolution: Benchmarking and Fine-Tuning for Streaming Content

Evgeney Bogatyrev, Khaled Abud, Ivan Molodetskikh, Nikita Alutis, Dmitry Vatolin

TL;DR

This work tackles real-time super-resolution for heavily compressed streaming video by introducing the StreamSR benchmark and EfRLFN, an efficient SR model. StreamSR provides a large-scale, diverse set of YouTube-derived LR-HR pairs with real-world compression artifacts, enabling realistic benchmarking of 11 real-time SR models. EfRLFN combines ERLFB blocks, tanh-based refinement, and Efficient Channel Attention with a composite Charbonnier–VGG–Sobel loss to deliver superior quality and speed, outperforming contemporaries in both objective metrics and subjective user studies. The study also shows that fine-tuning existing models on StreamSR yields broad performance gains across standard benchmarks, underscoring the value of dataset-aligned training for real-time SR deployment.

Abstract

Recent advancements in real-time super-resolution have enabled higher-quality video streaming, yet existing methods struggle with the unique challenges of compressed video content. Commonly used datasets do not accurately reflect the characteristics of streaming media, limiting the relevance of current benchmarks. To address this gap, we introduce a comprehensive dataset - StreamSR - sourced from YouTube, covering a wide range of video genres and resolutions representative of real-world streaming scenarios. We benchmark 11 state-of-the-art real-time super-resolution models to evaluate their performance for the streaming use-case. Furthermore, we propose EfRLFN, an efficient real-time model that integrates Efficient Channel Attention and a hyperbolic tangent activation function - a novel design choice in the context of real-time super-resolution. We extensively optimized the architecture to maximize efficiency and designed a composite loss function that improves training convergence. EfRLFN combines the strengths of existing architectures while improving both visual quality and runtime performance. Finally, we show that fine-tuning other models on our dataset results in significant performance gains that generalize well across various standard benchmarks. We made the dataset, the code, and the benchmark available at https://github.com/EvgeneyBogatyrev/EfRLFN.

Exploring Real-Time Super-Resolution: Benchmarking and Fine-Tuning for Streaming Content

TL;DR

This work tackles real-time super-resolution for heavily compressed streaming video by introducing the StreamSR benchmark and EfRLFN, an efficient SR model. StreamSR provides a large-scale, diverse set of YouTube-derived LR-HR pairs with real-world compression artifacts, enabling realistic benchmarking of 11 real-time SR models. EfRLFN combines ERLFB blocks, tanh-based refinement, and Efficient Channel Attention with a composite Charbonnier–VGG–Sobel loss to deliver superior quality and speed, outperforming contemporaries in both objective metrics and subjective user studies. The study also shows that fine-tuning existing models on StreamSR yields broad performance gains across standard benchmarks, underscoring the value of dataset-aligned training for real-time SR deployment.

Abstract

Recent advancements in real-time super-resolution have enabled higher-quality video streaming, yet existing methods struggle with the unique challenges of compressed video content. Commonly used datasets do not accurately reflect the characteristics of streaming media, limiting the relevance of current benchmarks. To address this gap, we introduce a comprehensive dataset - StreamSR - sourced from YouTube, covering a wide range of video genres and resolutions representative of real-world streaming scenarios. We benchmark 11 state-of-the-art real-time super-resolution models to evaluate their performance for the streaming use-case. Furthermore, we propose EfRLFN, an efficient real-time model that integrates Efficient Channel Attention and a hyperbolic tangent activation function - a novel design choice in the context of real-time super-resolution. We extensively optimized the architecture to maximize efficiency and designed a composite loss function that improves training convergence. EfRLFN combines the strengths of existing architectures while improving both visual quality and runtime performance. Finally, we show that fine-tuning other models on our dataset results in significant performance gains that generalize well across various standard benchmarks. We made the dataset, the code, and the benchmark available at https://github.com/EvgeneyBogatyrev/EfRLFN.
Paper Structure (35 sections, 4 equations, 13 figures, 12 tables)

This paper contains 35 sections, 4 equations, 13 figures, 12 tables.

Figures (13)

  • Figure 1: Left: Trade-off between user preference score and runtime speed for various $2\times$ super-resolution models. Blue line represents Pareto-optimal front. Models achieving real-time performanceare shown in green, while slower models are red. "$\times$" represent models fine-tuned on StreamSR dataset. Right: Examples of NVIDIA VSR artifacts compared to the proposed EfRLFN model and bicubic interpolation.
  • Figure 2: Visual summary of the proposed EfRLFN model and the comparison with the original RLFN architecture.
  • Figure 3: The process of video collection for our StreamSR dataset. The resulting set is split into train, test, and validation parts. Zoom in for better clarity.
  • Figure 4: The comparison between EfRLFN and several 4$\times$ real-time SR models.
  • Figure 5: (a) Pairwise preference evaluation of EfRLFN against other real-time super-resolution methods. (b) A comparison of the output feature maps from the first, third, and sixth ERLFB blocks. The features are taken from the output of the ECA block within each ERLFB.
  • ...and 8 more figures