Table of Contents
Fetching ...

ASSR-NeRF: Arbitrary-Scale Super-Resolution on Voxel Grid for High-Quality Radiance Fields Reconstruction

Ding-Jiun Huang, Zi-Ting Chou, Yu-Chiang Frank Wang, Cheng Sun

TL;DR

ASSR-NeRF tackles the challenge of high-quality SRNVS from LR training views by introducing a 3D arbitrageable SR framework. It comprises a voxel-based distilled feature field that carries 2D SR priors into 3D space and a generalizable VoxelGridSR module that applies density-distance-aware self-attention to refine radiance fields at arbitrary scales. The approach enables multi-view consistent SR without requiring HR reference views for each scene, and is trained across multiple scenes to generalize to unseen data. Experimental results on Synthetic-NeRF and BlendedMVS show state-of-the-art SRNVS performance, with clear improvements in texture detail, edge sharpness, and geometric consistency over existing NeRF-based and image SR methods, while also highlighting the method’s limitations in rendering speed and the need for robust MV-benchmarks.

Abstract

NeRF-based methods reconstruct 3D scenes by building a radiance field with implicit or explicit representations. While NeRF-based methods can perform novel view synthesis (NVS) at arbitrary scale, the performance in high-resolution novel view synthesis (HRNVS) with low-resolution (LR) optimization often results in oversmoothing. On the other hand, single-image super-resolution (SR) aims to enhance LR images to HR counterparts but lacks multi-view consistency. To address these challenges, we propose Arbitrary-Scale Super-Resolution NeRF (ASSR-NeRF), a novel framework for super-resolution novel view synthesis (SRNVS). We propose an attention-based VoxelGridSR model to directly perform 3D super-resolution (SR) on the optimized volume. Our model is trained on diverse scenes to ensure generalizability. For unseen scenes trained with LR views, we then can directly apply our VoxelGridSR to further refine the volume and achieve multi-view consistent SR. We demonstrate quantitative and qualitatively that the proposed method achieves significant performance in SRNVS.

ASSR-NeRF: Arbitrary-Scale Super-Resolution on Voxel Grid for High-Quality Radiance Fields Reconstruction

TL;DR

ASSR-NeRF tackles the challenge of high-quality SRNVS from LR training views by introducing a 3D arbitrageable SR framework. It comprises a voxel-based distilled feature field that carries 2D SR priors into 3D space and a generalizable VoxelGridSR module that applies density-distance-aware self-attention to refine radiance fields at arbitrary scales. The approach enables multi-view consistent SR without requiring HR reference views for each scene, and is trained across multiple scenes to generalize to unseen data. Experimental results on Synthetic-NeRF and BlendedMVS show state-of-the-art SRNVS performance, with clear improvements in texture detail, edge sharpness, and geometric consistency over existing NeRF-based and image SR methods, while also highlighting the method’s limitations in rendering speed and the need for robust MV-benchmarks.

Abstract

NeRF-based methods reconstruct 3D scenes by building a radiance field with implicit or explicit representations. While NeRF-based methods can perform novel view synthesis (NVS) at arbitrary scale, the performance in high-resolution novel view synthesis (HRNVS) with low-resolution (LR) optimization often results in oversmoothing. On the other hand, single-image super-resolution (SR) aims to enhance LR images to HR counterparts but lacks multi-view consistency. To address these challenges, we propose Arbitrary-Scale Super-Resolution NeRF (ASSR-NeRF), a novel framework for super-resolution novel view synthesis (SRNVS). We propose an attention-based VoxelGridSR model to directly perform 3D super-resolution (SR) on the optimized volume. Our model is trained on diverse scenes to ensure generalizability. For unseen scenes trained with LR views, we then can directly apply our VoxelGridSR to further refine the volume and achieve multi-view consistent SR. We demonstrate quantitative and qualitatively that the proposed method achieves significant performance in SRNVS.
Paper Structure (29 sections, 7 equations, 11 figures, 4 tables)

This paper contains 29 sections, 7 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Given a radiance field reconstructed from low-resolution (LR) training views, we perform radiance field super-resolution, leading to cleaner details in rendered views of high-resolution (HR).
  • Figure 1: Pipeline for dataset preprocessing: Given a raw training view, we first use Gounding DINO liu2023grounding to locate the target obejct, then utilize a segment anything model (SAM) kirillov2023segment to segment and generate training view with object mask.
  • Figure 2: Overview of ASSR-NeRF: Given a query point $x$ along a ray, view-dependent distilled features and densities of its nearest neighbors are first sampled from a distilled feature field. Then, VoxelGridSR module aggregates the queried modalities and performs self-attention for refined feature and density. Finally, a pre-trained decoder maps the refined feature to RGB value $c$.
  • Figure 2: Comparison of multi-view consistency: Super-resolving LR novel views from Zip-NeRF barron2023zipnerf by StableSR wang2023exploiting leads to serious inconsistency across views from different camera poses. ASSR-NeRF can render HR novel views of consistent geometry and appearance. We encourage readers to visit our video showing the consistency issue at https://drive.google.com/file/d/1h8WjmN7r1R79Cd4Q-dRLgbhToMZNR3pz/view.
  • Figure 3: Distilled feature field: In a student-teacher setting, features extracted from training views are distilled into a 3D student network. The student network is trained by minimizing the difference between rendered features and features from pre-trained image feature extractor, in addition to rendered colors and ground-truth pixel colors. FeatureNet turn voxel feature into view-dependent distilled features, and a pre-trained decoder maps view-dependent features RGB color.
  • ...and 6 more figures