Table of Contents
Fetching ...

AIM 2024 Sparse Neural Rendering Challenge: Methods and Results

Michal Nazarczuk, Sibi Catley-Chandar, Thomas Tanay, Richard Shaw, Eduardo Pérez-Pellitero, Radu Timofte, Xing Yan, Pan Wang, Yali Guo, Yongxin Wu, Youcheng Cai, Yanan Yang, Junting Li, Yanghong Zhou, P. Y. Mok, Zongqi He, Zhe Xiao, Kin-Chung Chan, Hana Lebeta Goshu, Cuixin Yang, Rongkang Dong, Jun Xiao, Kin-Man Lam, Jiayao Hao, Qiong Gao, Yanyan Zu, Junpei Zhang, Licheng Jiao, Xu Liu, Kuldeep Purohit

TL;DR

The paper presents the AIM 2024 Sparse Neural Rendering Challenge, which benchmarks sparse-view novel view synthesis using the SpaRe and DTU datasets across two tracks with 3 and 9 input views. It surveys diverse per-scene optimisation approaches built on FreeNeRF, including teacher-student frameworks, depth- and feature-based priors, and depth-guided regularisation, showing substantial improvements over baselines. Track 1 and Track 2 results reveal strong performance gains, with wang_pan achieving the top masked PSNR and perceptual metrics in Track 1, and Track 2 yielding larger improvements from 9 views, underscoring the value of additional input views and priors. The work standardises evaluation in sparse neural rendering and highlights effective strategies for handling shape-radiance ambiguity under sparse observations, setting a baseline for future research and competition-driven progress.

Abstract

This paper reviews the challenge on Sparse Neural Rendering that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ECCV 2024. This manuscript focuses on the competition set-up, the proposed methods and their respective results. The challenge aims at producing novel camera view synthesis of diverse scenes from sparse image observations. It is composed of two tracks, with differing levels of sparsity; 3 views in Track 1 (very sparse) and 9 views in Track 2 (sparse). Participants are asked to optimise objective fidelity to the ground-truth images as measured via the Peak Signal-to-Noise Ratio (PSNR) metric. For both tracks, we use the newly introduced Sparse Rendering (SpaRe) dataset and the popular DTU MVS dataset. In this challenge, 5 teams submitted final results to Track 1 and 4 teams submitted final results to Track 2. The submitted models are varied and push the boundaries of the current state-of-the-art in sparse neural rendering. A detailed description of all models developed in the challenge is provided in this paper.

AIM 2024 Sparse Neural Rendering Challenge: Methods and Results

TL;DR

The paper presents the AIM 2024 Sparse Neural Rendering Challenge, which benchmarks sparse-view novel view synthesis using the SpaRe and DTU datasets across two tracks with 3 and 9 input views. It surveys diverse per-scene optimisation approaches built on FreeNeRF, including teacher-student frameworks, depth- and feature-based priors, and depth-guided regularisation, showing substantial improvements over baselines. Track 1 and Track 2 results reveal strong performance gains, with wang_pan achieving the top masked PSNR and perceptual metrics in Track 1, and Track 2 yielding larger improvements from 9 views, underscoring the value of additional input views and priors. The work standardises evaluation in sparse neural rendering and highlights effective strategies for handling shape-radiance ambiguity under sparse observations, setting a baseline for future research and competition-driven progress.

Abstract

This paper reviews the challenge on Sparse Neural Rendering that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ECCV 2024. This manuscript focuses on the competition set-up, the proposed methods and their respective results. The challenge aims at producing novel camera view synthesis of diverse scenes from sparse image observations. It is composed of two tracks, with differing levels of sparsity; 3 views in Track 1 (very sparse) and 9 views in Track 2 (sparse). Participants are asked to optimise objective fidelity to the ground-truth images as measured via the Peak Signal-to-Noise Ratio (PSNR) metric. For both tracks, we use the newly introduced Sparse Rendering (SpaRe) dataset and the popular DTU MVS dataset. In this challenge, 5 teams submitted final results to Track 1 and 4 teams submitted final results to Track 2. The submitted models are varied and push the boundaries of the current state-of-the-art in sparse neural rendering. A detailed description of all models developed in the challenge is provided in this paper.
Paper Structure (21 sections, 12 equations, 8 figures, 4 tables)

This paper contains 21 sections, 12 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: An overview of FrameNeRF xing2024 proposed by team wang_pan.
  • Figure 2: An overview of the method proposed by MikeLee. The framework learns and combines information from two neural fields: one branch learns an RGB field, while the other learns a feature field, sharing geometry information. The colour prediction branch is conditioned on the prior learned from the feature branch. The network is trained to predict local features and colour at the pixel level in the sparse training views.
  • Figure 3: In the method proposed by MikeLee, the network predicting the colour $c$ of a point is explicitly conditioned on the local features of the point. Feature supervision supervises $f_i$ based on prior knowledge from a pretrained network; Feature condition concatenates the learned prior $f_i$ as additional input to $M_c$ for colour prediction.
  • Figure 4: An overview of ESNeRF proposed by zongqihe. Colour- and depth-based losses are applied, in addition to "occlusion" regularisation and near-far field optimisation.
  • Figure 5: An overview of the method proposed by Thirteen.
  • ...and 3 more figures