Table of Contents
Fetching ...

Non-Aligned Reference Image Quality Assessment for Novel View Synthesis

Abhijay Ghildyal, Rajesh Sureddi, Nabajeet Barman, Saman Zadtootaghaj, Alan Bovik

TL;DR

The paper tackles perceptual quality assessment for novel view synthesis when pixel-aligned references are unavailable. It introduces Non-Aligned Reference IQA (NAR-IQA) and the NOVA model, trained with synthetic, localized distortions within Temporal Regions of Interest and guided by contrastive learning on a LoRA-enhanced DINOv2 backbone. By combining IQA model supervision with KL regularization and carefully curated triplets, NOVA achieves state-of-the-art performance on both aligned and non-aligned reference settings and demonstrates strong correlation with human judgments on NVS benchmarks. The work also provides a large, synthetic training dataset, a comprehensive NVS NAR-IQA benchmark, and supplementary visualizations to aid interpretability, underscoring practical impact for real-world NVS QA where aligned references are scarce.

Abstract

Evaluating the perceptual quality of Novel View Synthesis (NVS) images remains a key challenge, particularly in the absence of pixel-aligned ground truth references. Full-Reference Image Quality Assessment (FR-IQA) methods fail under misalignment, while No-Reference (NR-IQA) methods struggle with generalization. In this work, we introduce a Non-Aligned Reference (NAR-IQA) framework tailored for NVS, where it is assumed that the reference view shares partial scene content but lacks pixel-level alignment. We constructed a large-scale image dataset containing synthetic distortions targeting Temporal Regions of Interest (TROI) to train our NAR-IQA model. Our model is built on a contrastive learning framework that incorporates LoRA-enhanced DINOv2 embeddings and is guided by supervision from existing IQA methods. We train exclusively on synthetically generated distortions, deliberately avoiding overfitting to specific real NVS samples and thereby enhancing the model's generalization capability. Our model outperforms state-of-the-art FR-IQA, NR-IQA, and NAR-IQA methods, achieving robust performance on both aligned and non-aligned references. We also conducted a novel user study to gather data on human preferences when viewing non-aligned references in NVS. We find strong correlation between our proposed quality prediction model and the collected subjective ratings. For dataset and code, please visit our project page: https://stootaghaj.github.io/nova-project/

Non-Aligned Reference Image Quality Assessment for Novel View Synthesis

TL;DR

The paper tackles perceptual quality assessment for novel view synthesis when pixel-aligned references are unavailable. It introduces Non-Aligned Reference IQA (NAR-IQA) and the NOVA model, trained with synthetic, localized distortions within Temporal Regions of Interest and guided by contrastive learning on a LoRA-enhanced DINOv2 backbone. By combining IQA model supervision with KL regularization and carefully curated triplets, NOVA achieves state-of-the-art performance on both aligned and non-aligned reference settings and demonstrates strong correlation with human judgments on NVS benchmarks. The work also provides a large, synthetic training dataset, a comprehensive NVS NAR-IQA benchmark, and supplementary visualizations to aid interpretability, underscoring practical impact for real-world NVS QA where aligned references are scarce.

Abstract

Evaluating the perceptual quality of Novel View Synthesis (NVS) images remains a key challenge, particularly in the absence of pixel-aligned ground truth references. Full-Reference Image Quality Assessment (FR-IQA) methods fail under misalignment, while No-Reference (NR-IQA) methods struggle with generalization. In this work, we introduce a Non-Aligned Reference (NAR-IQA) framework tailored for NVS, where it is assumed that the reference view shares partial scene content but lacks pixel-level alignment. We constructed a large-scale image dataset containing synthetic distortions targeting Temporal Regions of Interest (TROI) to train our NAR-IQA model. Our model is built on a contrastive learning framework that incorporates LoRA-enhanced DINOv2 embeddings and is guided by supervision from existing IQA methods. We train exclusively on synthetically generated distortions, deliberately avoiding overfitting to specific real NVS samples and thereby enhancing the model's generalization capability. Our model outperforms state-of-the-art FR-IQA, NR-IQA, and NAR-IQA methods, achieving robust performance on both aligned and non-aligned references. We also conducted a novel user study to gather data on human preferences when viewing non-aligned references in NVS. We find strong correlation between our proposed quality prediction model and the collected subjective ratings. For dataset and code, please visit our project page: https://stootaghaj.github.io/nova-project/

Paper Structure

This paper contains 28 sections, 3 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Visual comparison of aligned and non-aligned references for assessing the quality of distorted images $I_0$ and $I_1$. Each row shows a sample from the dataset with an aligned reference, a non-aligned reference, and two distorted views. The table below highlights metric performance (e.g., LPIPS, DISTS, CrossScore) and human preferences under both reference settings.
  • Figure 2: Views along the camera trajectory used to train the NeRF/GS models are indicated, while the excluded segments simulate novel views. These segments are used to create distorted image triplets with both aligned and non-aligned references.
  • Figure 3: An example illustrating the synthetic dataset generation strategy, where distortions are applied locally within Temporal Regions of Interest (TROIs).
  • Figure 4: Overview of our model architecture. The network consists of a LoRA-enhanced DINOv2 backbone with dual outputs for embedding, trained using contrastive and auxiliary losses.
  • Figure 5: Quantitative comparison (accuracy in %) of IQA models on the NVS NAR-IQA benchmark. We evaluate full-reference, non-aligned reference, and no-reference metrics under both aligned and non-aligned reference conditions.
  • ...and 4 more figures