1st Place Solution for ICCV 2023 OmniObject3D Challenge: Sparse-View Reconstruction

Hang Du; Yaping Xue; Weidong Dai; Xuejun Yan; Jingjing Wang

1st Place Solution for ICCV 2023 OmniObject3D Challenge: Sparse-View Reconstruction

Hang Du, Yaping Xue, Weidong Dai, Xuejun Yan, Jingjing Wang

TL;DR

The paper tackles sparse-view reconstruction for OmniObject3D, focusing on reconstructing novel views and surfaces from as few as 1–3 images. It leverages a Pixel-NeRF backbone pre-trained on a diverse object set and further refines it per test scene, augmented by depth supervision and BARF-style coarse-to-fine positional encoding. Key findings show substantial gains from learning priors across 48 representative categories and from per-scene fine-tuning, achieving a final PSNR of 25.446 on the challenge test. The approach demonstrates strong generalization and practical potential for efficient sparse-view 3D reconstruction in real-world object-centric datasets.

Abstract

In this report, we present the 1st place solution for ICCV 2023 OmniObject3D Challenge: Sparse-View Reconstruction. The challenge aims to evaluate approaches for novel view synthesis and surface reconstruction using only a few posed images of each object. We utilize Pixel-NeRF as the basic model, and apply depth supervision as well as coarse-to-fine positional encoding. The experiments demonstrate the effectiveness of our approach in improving sparse-view reconstruction quality. We ranked first in the final test with a PSNR of 25.44614.

1st Place Solution for ICCV 2023 OmniObject3D Challenge: Sparse-View Reconstruction

TL;DR

Abstract

Paper Structure (10 sections, 4 equations, 2 figures, 4 tables)

This paper contains 10 sections, 4 equations, 2 figures, 4 tables.

Introduction
Method
Depth Supervision
Coarse-to-fine Positional Encoding
Experiments
Data Preparation
Implementation Details
Results on Validation Set
Results on Challenge Test
Conclusion

Figures (2)

Figure 1: Overview architecture of Pixel-NeRF yu2021pixelnerf. For a query point $x$ along a target ray with view direction $d$, they extract a corresponding feature from the image feature volume $W$ via projection and interpolation. Then, they feed the feature with spatial coordinates into the NeRF model, and predict RGB color values for neural rendering.
Figure 2: Qualitative comparisons on challenge test. The first line of each object is generated by the pre-trained model, and the second line is generated by each object-level fine-tuned model.

1st Place Solution for ICCV 2023 OmniObject3D Challenge: Sparse-View Reconstruction

TL;DR

Abstract

1st Place Solution for ICCV 2023 OmniObject3D Challenge: Sparse-View Reconstruction

Authors

TL;DR

Abstract

Table of Contents

Figures (2)