Table of Contents
Fetching ...

NTIRE 2025 Challenge on Video Quality Enhancement for Video Conferencing: Datasets, Methods and Results

Varun Jain, Zongwei Wu, Quan Zou, Louis Florentin, Henrik Turbell, Sandeep Siddhartha, Radu Timofte, others

TL;DR

The NTIRE 2025 Challenge on Video Quality Enhancement for Video Conferencing targets studio-like video quality in real-time conferencing by correcting foreground lighting, color, noise, and sharpness via a differentiable VQA-guided framework. A dual data strategy combines unpaired real footage with paired synthetic relighting data, while a pre-trained Siamese VQA model enables objective-guided ranking of submissions under strict hardware constraints. LUT-based approaches dominate the results due to their efficiency and temporal stability, with several entrants (e.g., TMobileRestore, DeepView) delivering competitive performance through staged color correction and lightweight restoration branches; other teams explore LUT fusion, color spaces like HVI-CIDNet, and perceptual quality losses. The work advances practical VQE by detailing datasets, evaluation protocols, and multiple hardware-friendly methods, underscoring the viability of real-time, perceptually aligned video enhancement for conferencing applications.

Abstract

This paper presents a comprehensive review of the 1st Challenge on Video Quality Enhancement for Video Conferencing held at the NTIRE workshop at CVPR 2025, and highlights the problem statement, datasets, proposed solutions, and results. The aim of this challenge was to design a Video Quality Enhancement (VQE) model to enhance video quality in video conferencing scenarios by (a) improving lighting, (b) enhancing colors, (c) reducing noise, and (d) enhancing sharpness - giving a professional studio-like effect. Participants were given a differentiable Video Quality Assessment (VQA) model, training, and test videos. A total of 91 participants registered for the challenge. We received 10 valid submissions that were evaluated in a crowdsourced framework.

NTIRE 2025 Challenge on Video Quality Enhancement for Video Conferencing: Datasets, Methods and Results

TL;DR

The NTIRE 2025 Challenge on Video Quality Enhancement for Video Conferencing targets studio-like video quality in real-time conferencing by correcting foreground lighting, color, noise, and sharpness via a differentiable VQA-guided framework. A dual data strategy combines unpaired real footage with paired synthetic relighting data, while a pre-trained Siamese VQA model enables objective-guided ranking of submissions under strict hardware constraints. LUT-based approaches dominate the results due to their efficiency and temporal stability, with several entrants (e.g., TMobileRestore, DeepView) delivering competitive performance through staged color correction and lightweight restoration branches; other teams explore LUT fusion, color spaces like HVI-CIDNet, and perceptual quality losses. The work advances practical VQE by detailing datasets, evaluation protocols, and multiple hardware-friendly methods, underscoring the viability of real-time, perceptually aligned video enhancement for conferencing applications.

Abstract

This paper presents a comprehensive review of the 1st Challenge on Video Quality Enhancement for Video Conferencing held at the NTIRE workshop at CVPR 2025, and highlights the problem statement, datasets, proposed solutions, and results. The aim of this challenge was to design a Video Quality Enhancement (VQE) model to enhance video quality in video conferencing scenarios by (a) improving lighting, (b) enhancing colors, (c) reducing noise, and (d) enhancing sharpness - giving a professional studio-like effect. Participants were given a differentiable Video Quality Assessment (VQA) model, training, and test videos. A total of 91 participants registered for the challenge. We received 10 valid submissions that were evaluated in a crowdsourced framework.

Paper Structure

This paper contains 27 sections, 9 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Ground truth from (top) our synthetics framework, (bottom) the AutoAdjust solution. The top row shows the input with suboptimal foreground illumination which is fixed by adding a studio light setup in front of the subject which is simulated in synthetics and predicted via global changes in the real data.
  • Figure 2: Comparison of lighting setup in the Synthetic Portrait Relighting dataset. (left) Lighting from the HDRI, (center) key light with HDRI lighting turned off, and (right) key and fill lights with HDRI lighting turned off. Note that the HDRI is only used as a background when using the studio lighting and does not contribute to the illumination of the subject.
  • Figure 3: Color intensity in source and target images of our Synthetic Portrait Relighting dataset. The source images are dark with intensity centered around $50$. The target fixes this by boosting illumination -- making it more uniform and span a larger range.
  • Figure 4: Interval plots illustrating the mean P.910 Bradley-Terry scores and their corresponding $95$% confidence intervals for the $10$ submissions, input videos, and the provided baseline. (Top) Overall preference, and (bottom) factors influencing preference.
  • Figure 5: Two stage video conferencing enhancement framework proposed by team TMobileRestore.
  • ...and 4 more figures