Table of Contents
Fetching ...

Towards a Perceptual Evaluation Framework for Lighting Estimation

Justine Giroux, Mohammad Reza Karimi Dastjerdi, Yannick Hold-Geoffroy, Javier Vazquez-Corral, Jean-François Lalonde

TL;DR

A controlled psychophysical experiment is designed where human observers must choose their preference amongst rendered scenes lit using a set of lighting estimation algorithms selected from the recent literature, and it is demonstrated that none of the most popular IQA metrics from the literature, taken individually, correctly represent human perception.

Abstract

Progress in lighting estimation is tracked by computing existing image quality assessment (IQA) metrics on images from standard datasets. While this may appear to be a reasonable approach, we demonstrate that doing so does not correlate to human preference when the estimated lighting is used to relight a virtual scene into a real photograph. To study this, we design a controlled psychophysical experiment where human observers must choose their preference amongst rendered scenes lit using a set of lighting estimation algorithms selected from the recent literature, and use it to analyse how these algorithms perform according to human perception. Then, we demonstrate that none of the most popular IQA metrics from the literature, taken individually, correctly represent human perception. Finally, we show that by learning a combination of existing IQA metrics, we can more accurately represent human preference. This provides a new perceptual framework to help evaluate future lighting estimation algorithms.

Towards a Perceptual Evaluation Framework for Lighting Estimation

TL;DR

A controlled psychophysical experiment is designed where human observers must choose their preference amongst rendered scenes lit using a set of lighting estimation algorithms selected from the recent literature, and it is demonstrated that none of the most popular IQA metrics from the literature, taken individually, correctly represent human perception.

Abstract

Progress in lighting estimation is tracked by computing existing image quality assessment (IQA) metrics on images from standard datasets. While this may appear to be a reasonable approach, we demonstrate that doing so does not correlate to human preference when the estimated lighting is used to relight a virtual scene into a real photograph. To study this, we design a controlled psychophysical experiment where human observers must choose their preference amongst rendered scenes lit using a set of lighting estimation algorithms selected from the recent literature, and use it to analyse how these algorithms perform according to human perception. Then, we demonstrate that none of the most popular IQA metrics from the literature, taken individually, correctly represent human perception. Finally, we show that by learning a combination of existing IQA metrics, we can more accurately represent human preference. This provides a new perceptual framework to help evaluate future lighting estimation algorithms.
Paper Structure (18 sections, 3 equations, 5 figures, 1 table)

This paper contains 18 sections, 3 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: We pit image comparison metrics, used to quantify the performance of lighting estimation algorithms, against human perception. When asked which render looks most plausible, our controlled psychophysical study reveals that humans preference contradicts image metrics in the vast majority of cases. This paper questions the current practice of employing image quality metrics for evaluating lighting estimation algorithms when used for the task of virtual object insertion: can we do better by considering human perception?
  • Figure 2: Example of the accuracy (task 1; left) and plausibility (task 2; right) tasks, for the diffuse (top) and glossy (bottom) material, assigned to the observers during the experiment. The question asked to the observers is written above each example.
  • Figure 3: Example of the stimuli produced for a scene by each of the lighting estimation methods (columns), for the different tasks and experiments (rows). The last row corresponds to the estimated lighting (projected to an equirectangular format) by each of the methods and used for the renders. The "GT" columns correspond to the ground truth---note that it is not used in task 2 and shown here only for reference.
  • Figure 4: Thurstone Case V Law of Comparative Judgement ($z$-scores) for all the observers as a function of the different lighting estimation methods (bars), for the different materials (rows) and tasks (columns). A positive score indicates that observers generally prefer the stimuli rendered with the lighting estimation method, and not preferred when the score is negative. The scores of all the methods for an experiment sum to 0.0. The brackets above indicate pairs of methods for which the perceptual difference is statistically significant. Error bars correspond to 95% confidence interval.
  • Figure 5: Agreement between the observer scores and the metric scores (columns) for all the lighting estimation methods (indoor: orange bars; outdoor: blue bars), for the different types of experiments (rows). The lower horizontal grey line is set at chance level (∼ 0.5) and the higher one corresponds to the perfect observer (set at 1.0). The orange (indoor) and blue (outdoor) lines correspond to the expected observer agreement score (for task 2, diffuse, blue and orange lines are overlapped). The stars indicate methods that have an agreement score equal or superior to the expected observer. "Ours" and "Ours Holdout" refer to our learned metric combination, see \ref{['sec:metric']} for more details. The No-Reference IQA metrics are indicated by asterisks.