Appreciate the View: A Task-Aware Evaluation Framework for Novel View Synthesis

Saar Stern; Ido Sobol; Or Litany

Appreciate the View: A Task-Aware Evaluation Framework for Novel View Synthesis

Saar Stern, Ido Sobol, Or Litany

TL;DR

This work addresses the challenge of reliably evaluating novel view synthesis by introducing a task-aware framework that leverages diffusion-model features from the strong NVS backbone Zero123-XL. The authors construct PRISM, consisting of a reference-based distance $D_{PRISM}$ and a reference-free distributional measure $MMD_{PRISM}$, both optimized through a contrastive fine-tuning on a purpose-built ViewMatch dataset to discriminate plausible vs. implausible view syntheses. Across multiple benchmarks (Toys4K, Google Scanned Objects, OmniObject3D) and human studies, $D_{PRISM}$ shows strong alignment with human preferences, while $MMD_{PRISM}$ yields stable, interpretable model rankings without ground-truth targets. The framework demonstrates robustness to pose misalignment and image degradations, offering a principled, practical pathway toward more reliable progress in single-view NVS evaluation, with limitations including dependence on the Zero123-XL backbone and potential extensions to scene-level scenarios.

Abstract

The goal of Novel View Synthesis (NVS) is to generate realistic images of a given content from unseen viewpoints. But how can we trust that a generated image truly reflects the intended transformation? Evaluating its reliability remains a major challenge. While recent generative models, particularly diffusion-based approaches, have significantly improved NVS quality, existing evaluation metrics struggle to assess whether a generated image is both realistic and faithful to the source view and intended viewpoint transformation. Standard metrics, such as pixel-wise similarity and distribution-based measures, often mis-rank incorrect results as they fail to capture the nuanced relationship between the source image, viewpoint change, and generated output. We propose a task-aware evaluation framework that leverages features from a strong NVS foundation model, Zero123, combined with a lightweight tuning step to enhance discrimination. Using these features, we introduce two complementary evaluation metrics: a reference-based score, $D_{\text{PRISM}}$, and a reference-free score, $\text{MMD}_{\text{PRISM}}$. Both reliably identify incorrect generations and rank models in agreement with human preference studies, addressing a fundamental gap in NVS evaluation. Our framework provides a principled and practical approach to assessing synthesis quality, paving the way for more reliable progress in novel view synthesis. To further support this goal, we apply our reference-free metric to six NVS methods across three benchmarks: Toys4K, Google Scanned Objects (GSO), and OmniObject3D, where $\text{MMD}_{\text{PRISM}}$ produces a clear and stable ranking, with lower scores consistently indicating stronger models.

Appreciate the View: A Task-Aware Evaluation Framework for Novel View Synthesis

TL;DR

and a reference-free distributional measure

, both optimized through a contrastive fine-tuning on a purpose-built ViewMatch dataset to discriminate plausible vs. implausible view syntheses. Across multiple benchmarks (Toys4K, Google Scanned Objects, OmniObject3D) and human studies,

shows strong alignment with human preferences, while

yields stable, interpretable model rankings without ground-truth targets. The framework demonstrates robustness to pose misalignment and image degradations, offering a principled, practical pathway toward more reliable progress in single-view NVS evaluation, with limitations including dependence on the Zero123-XL backbone and potential extensions to scene-level scenarios.

Abstract

, and a reference-free score,

. Both reliably identify incorrect generations and rank models in agreement with human preference studies, addressing a fundamental gap in NVS evaluation. Our framework provides a principled and practical approach to assessing synthesis quality, paving the way for more reliable progress in novel view synthesis. To further support this goal, we apply our reference-free metric to six NVS methods across three benchmarks: Toys4K, Google Scanned Objects (GSO), and OmniObject3D, where

produces a clear and stable ranking, with lower scores consistently indicating stronger models.

Appreciate the View: A Task-Aware Evaluation Framework for Novel View Synthesis

TL;DR

Abstract

Appreciate the View: A Task-Aware Evaluation Framework for Novel View Synthesis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)