Table of Contents
Fetching ...

Can No-Reference Quality-Assessment Methods Serve as Perceptual Losses for Super-Resolution?

Egor Kashkarov, Egor Chistov, Ivan Molodetskikh, Dmitriy Vatolin

TL;DR

The paper investigates whether no-reference IQA methods can serve as perceptual losses for video super-resolution. By evaluating multiple VSR architectures and a broad set of NR-IQA losses (with comparisons to LPIPS and PieAPP), the study shows that naive optimization often introduces artifacts and can destabilize some IQA signals, while a carefully designed training procedure that combines NR-IQA with LPIPS can mitigate these issues. The findings indicate that NR-IQA losses alone are not reliably beneficial and that cross-IQA interactions must be considered; nonetheless, some NR-IQA methods (e.g., PaQ-2-PiQ, CLIP-IQA) show artifact-free behavior in certain configurations. The work highlights practical implications for designing perceptual losses in VSR and suggests directions for more robust loss-compositions and gradient-based weight selection.

Abstract

Perceptual losses play an important role in constructing deep-neural-network-based methods by increasing the naturalness and realism of processed images and videos. Use of perceptual losses is often limited to LPIPS, a fullreference method. Even though deep no-reference image-qualityassessment methods are excellent at predicting human judgment, little research has examined their incorporation in loss functions. This paper investigates direct optimization of several video-superresolution models using no-reference image-quality-assessment methods as perceptual losses. Our experimental results show that straightforward optimization of these methods produce artifacts, but a special training procedure can mitigate them.

Can No-Reference Quality-Assessment Methods Serve as Perceptual Losses for Super-Resolution?

TL;DR

The paper investigates whether no-reference IQA methods can serve as perceptual losses for video super-resolution. By evaluating multiple VSR architectures and a broad set of NR-IQA losses (with comparisons to LPIPS and PieAPP), the study shows that naive optimization often introduces artifacts and can destabilize some IQA signals, while a carefully designed training procedure that combines NR-IQA with LPIPS can mitigate these issues. The findings indicate that NR-IQA losses alone are not reliably beneficial and that cross-IQA interactions must be considered; nonetheless, some NR-IQA methods (e.g., PaQ-2-PiQ, CLIP-IQA) show artifact-free behavior in certain configurations. The work highlights practical implications for designing perceptual losses in VSR and suggests directions for more robust loss-compositions and gradient-based weight selection.

Abstract

Perceptual losses play an important role in constructing deep-neural-network-based methods by increasing the naturalness and realism of processed images and videos. Use of perceptual losses is often limited to LPIPS, a fullreference method. Even though deep no-reference image-qualityassessment methods are excellent at predicting human judgment, little research has examined their incorporation in loss functions. This paper investigates direct optimization of several video-superresolution models using no-reference image-quality-assessment methods as perceptual losses. Our experimental results show that straightforward optimization of these methods produce artifacts, but a special training procedure can mitigate them.
Paper Structure (10 sections, 1 equation, 3 figures, 1 table)

This paper contains 10 sections, 1 equation, 3 figures, 1 table.

Figures (3)

  • Figure 1: BasicVSR++ on a test sequence from Vimeo-90K. Fine-tuning with MANIQA produces green dotted edges and "eyes," while fine-tuning using LPIPS combined with MANIQA does not. Although the bottom row exhibits a barely noticeable difference, optimizing the two losses yields a higher average gain over the 11 IQA methods we evaluated.
  • Figure 2: Relative gain for IQA methods after optimization with different IQA-method combinations. Rows represent an additional component to the Charbonnier loss function and columns represent averaged ralative gain over different IQA-metrics. We averaged the results over BasicVSR++, iSeeBetter, VRT, and three datasets. Although LPIPS and PieAPP react badly to the tuning of most no-reference IQA methods, loss-function combinations improve the judgment of nearly all IQA methods.
  • Figure 3: BasicVSR++ on a test sequence from REDS. The top row is a baseline model trained with Charbonnier loss only. Fine-tuning with DBCNN, NIMA, MDTVSFA, and MANIQA produces "colored dot" artifacts, and fine-tuning with HyperIQA blurs the image. Even when used in combinations, NIMA still creates artifacts.