CrossScore: Towards Multi-View Image Evaluation and Scoring

Zirui Wang; Wenjing Bian; Victor Adrian Prisacariu

CrossScore: Towards Multi-View Image Evaluation and Scoring

Zirui Wang, Wenjing Bian, Victor Adrian Prisacariu

TL;DR

By comparing a query image against multiple views of the same scene, this method addresses the limitations of existing metrics in novel view synthesis (NVS) and similar tasks where direct reference images are unavailable.

Abstract

We introduce a novel cross-reference image quality assessment method that effectively fills the gap in the image assessment landscape, complementing the array of established evaluation schemes -- ranging from full-reference metrics like SSIM, no-reference metrics such as NIQE, to general-reference metrics including FID, and Multi-modal-reference metrics, e.g., CLIPScore. Utilising a neural network with the cross-attention mechanism and a unique data collection pipeline from NVS optimisation, our method enables accurate image quality assessment without requiring ground truth references. By comparing a query image against multiple views of the same scene, our method addresses the limitations of existing metrics in novel view synthesis (NVS) and similar tasks where direct reference images are unavailable. Experimental results show that our method is closely correlated to the full-reference metric SSIM, while not requiring ground truth references.

CrossScore: Towards Multi-View Image Evaluation and Scoring

TL;DR

Abstract

Paper Structure (36 sections, 3 equations, 8 figures, 8 tables)

This paper contains 36 sections, 3 equations, 8 figures, 8 tables.

Introduction
Related Work
Image Quality Assessment Metrics
Full-Reference Metrics
Reduced-Reference Metrics
No-Reference Metrics
General-Reference Metrics
Multi-Modal-Reference Metrics
Image Quality Assessment in NVS Systems
Method
Network Design
Image Encoder $\Phi_{\text{enc}}$
Cross-Reference Module $\Phi_{\text{cross}}$
Score Regression Head $\Phi_{\text{dec}}$
Training Strategy
...and 21 more sections

Figures (8)

Figure 1: We propose a novel cross-reference (CR) image quality assessment (IQA) scheme, which evaluates a query image using multiple unregistered reference images that are captured from different viewpoints. This approach sets a new research trajectory apart from conventional IQA schemes such as full-reference (FR), general-reference (GR), no-reference (NR), and multi-modal-reference (MMR).
Figure 2: Data Generation and Training Pipeline. We employ existing NVS models to generate pairs of rendered images and SSIM maps for training purposes. As the NVS model iterates, rendered images at various optimisation stages are used as the query image for input into our model. Together with a set of reference images from the same scene, our model predicts a score map, supervised by the corresponding SSIM map. More details see \ref{['sec:method:network', 'sec:method:training_strategy']}.
Figure 3: Qualitative results of CrossScore and SSIM on various datasets. We present examples for test results on each dataset (from left to right: RE10K, MFR, Mip360). We show our score maps have a strong correlation with SSIM score, demonstrating the generalisation capability of our approach across diverse datasets. Score colour coding: red represents the highest score, followed by orange, green, and blue, indicating decreasing scores respectively.
Figure 4: Illustration of two IQA approaches in NVS: 1) with subsampled test views and 2) with true novel views. The first approach relies on full-reference metrics that requires ground truth images, precluding test views in training (blue circles enclosed in orange boxes). In contrast, our cross-reference approach bypasses the need for ground truth views, allowing NVS evaluation from true novel views (green circles) and enabling NVS modelling to utilise the entire captured image set.
Figure 5: Visualisation of attention weights from the cross-reference module $\Phi_{\text{cross}}$.Top left: a query image with a region of interest (centre of image) highlighted with a magenta box. Right column: We show 3 reference images from our cross-reference set with attention maps overlaid. The attention maps illustrate the attention that is paid to predicting image quality at the query region. Red and blue denote high and low attention weights respectively. Note that we use $N_\text{ref}=5$ but only 3 is shown due to space constraint. Bottom: Predicted CrossScore map and SSIM map. Red and blue denote high and low quality image regions respectively.
...and 3 more figures

CrossScore: Towards Multi-View Image Evaluation and Scoring

TL;DR

Abstract

CrossScore: Towards Multi-View Image Evaluation and Scoring

Authors

TL;DR

Abstract

Table of Contents

Figures (8)