Table of Contents
Fetching ...

PR-IQA: Partial-Reference Image Quality Assessment for Diffusion-Based Novel View Synthesis

Inseong Choi, Siwoo Lee, Seung-Hun Nam, Soohwan Song

Abstract

Diffusion models are promising for sparse-view novel view synthesis (NVS), as they can generate pseudo-ground-truth views to aid 3D reconstruction pipelines like 3D Gaussian Splatting (3DGS). However, these synthesized images often contain photometric and geometric inconsistencies, and their direct use for supervision can impair reconstruction. To address this, we propose Partial-Reference Image Quality Assessment (PR-IQA), a framework that evaluates diffusion-generated views using reference images from different poses, eliminating the need for ground truth. PR-IQA first computes a geometrically consistent partial quality map in overlapping regions. It then performs quality completion to inpaint this partial map into a dense, full-image map. This completion is achieved via a cross-attention mechanism that incorporates reference-view context, ensuring cross-view consistency and enabling thorough quality assessment. When integrated into a diffusion-augmented 3DGS pipeline, PR-IQA restricts supervision to high-confidence regions identified by its quality maps. Experiments demonstrate that PR-IQA outperforms existing IQA methods, achieving full-reference-level accuracy without ground-truth supervision. Thus, our quality-aware 3DGS approach more effectively filters inconsistencies, producing superior 3D reconstructions and NVS results. The project page is available at https://kakaomacao.github.io/pr-iqa-project-page/.

PR-IQA: Partial-Reference Image Quality Assessment for Diffusion-Based Novel View Synthesis

Abstract

Diffusion models are promising for sparse-view novel view synthesis (NVS), as they can generate pseudo-ground-truth views to aid 3D reconstruction pipelines like 3D Gaussian Splatting (3DGS). However, these synthesized images often contain photometric and geometric inconsistencies, and their direct use for supervision can impair reconstruction. To address this, we propose Partial-Reference Image Quality Assessment (PR-IQA), a framework that evaluates diffusion-generated views using reference images from different poses, eliminating the need for ground truth. PR-IQA first computes a geometrically consistent partial quality map in overlapping regions. It then performs quality completion to inpaint this partial map into a dense, full-image map. This completion is achieved via a cross-attention mechanism that incorporates reference-view context, ensuring cross-view consistency and enabling thorough quality assessment. When integrated into a diffusion-augmented 3DGS pipeline, PR-IQA restricts supervision to high-confidence regions identified by its quality maps. Experiments demonstrate that PR-IQA outperforms existing IQA methods, achieving full-reference-level accuracy without ground-truth supervision. Thus, our quality-aware 3DGS approach more effectively filters inconsistencies, producing superior 3D reconstructions and NVS results. The project page is available at https://kakaomacao.github.io/pr-iqa-project-page/.

Paper Structure

This paper contains 59 sections, 12 equations, 16 figures, 16 tables.

Figures (16)

  • Figure 1: Overview of the proposed PR-IQA and quality-aware 3DGS. (a) Diffusion models generate novel views (pseudo-GTs) from sparse inputs, which often contain photometric or geometric artifacts. (b) We propose PR-IQA, a cross-reference method predicting a dense, pixel-level quality map from unaligned references. It produces a complete map correlating closely with FR-IQA metrics (e.g., DINOv2 feature-similarity map) without requiring a GT. (c) This quality map enables a dual-filtering strategy (image selection and pixel masking) for 3DGS training, reducing reconstruction errors and improving fidelity.
  • Figure 2: (a) Overview of the PR-IQA pipeline. The framework operates in two stages. First, we warp DINOv2 features from the reference $I_r$ to the query $I_q$ view via dense stereo, generating a partial quality map ($\hat{Q}$) for overlapping regions. Next, a three-stream (query, reference, partial map) encoder-decoder predicts the full quality map $Q$. (b) Architecture of the Dual-Gated Attention Block. The block sequentially applies two attention mechanisms: a Channel Attention Module (using max/avg pooling and MLP) recalibrates channels, and a Spatial Attention Module (using Q, K, V projections and softmax) provides spatial refinement. The block integrates both with normalization, residual connections ($\oplus$), and an FFN. Each encoder and decoder is composed of this block.
  • Figure 3: Qualitative comparison of estimated quality maps from IQA methods. Colors encode estimated quality, where low-quality pixels are shown in blue and high-quality pixels are shown in red. Compared to baselines, our results ("Ours") more faithfully recover object silhouettes and fine structures, closely matching the GT (DINOv2-SIM).
  • Figure 4: Qualitative comparison of rendered novel views from IQA-guided 3DGS. While baseline methods produce results with artifacts, blurring, or misaligned Gaussians, our PR-IQA-guided method ("Ours") avoids these failure modes, yielding significantly cleaner and more coherent reconstructions.
  • Figure 5: Detailed architecture of the proposed model. The network employs an encoder–decoder design featuring cross- and self-attention modules, query fusion, and mask-aware pixel-shuffle downsampling. Key specifications, including stage-wise block counts, attention heads, and the status of component sharing (frozen vs. trainable), are explicitly annotated.
  • ...and 11 more figures