Table of Contents
Fetching ...

GUMBEL-NERF: Representing Unseen Objects as Part-Compositional Neural Radiance Fields

Yusuke Sekikawa, Chingwei Hsu, Satoshi Ikehata, Rei Kawakami, Ikuro Sato

TL;DR

Gumbel-NeRF adopts a hindsight expert selection mechanism, which guarantees continuity in the density field even near the experts' boundaries, and demonstrates the superiority of Gumbel-NeRF over the baselines in terms of various image quality metrics.

Abstract

We propose Gumbel-NeRF, a mixture-of-expert (MoE) neural radiance fields (NeRF) model with a hindsight expert selection mechanism for synthesizing novel views of unseen objects. Previous studies have shown that the MoE structure provides high-quality representations of a given large-scale scene consisting of many objects. However, we observe that such a MoE NeRF model often produces low-quality representations in the vicinity of experts' boundaries when applied to the task of novel view synthesis of an unseen object from one/few-shot input. We find that this deterioration is primarily caused by the foresight expert selection mechanism, which may leave an unnatural discontinuity in the object shape near the experts' boundaries. Gumbel-NeRF adopts a hindsight expert selection mechanism, which guarantees continuity in the density field even near the experts' boundaries. Experiments using the SRN cars dataset demonstrate the superiority of Gumbel-NeRF over the baselines in terms of various image quality metrics.

GUMBEL-NERF: Representing Unseen Objects as Part-Compositional Neural Radiance Fields

TL;DR

Gumbel-NeRF adopts a hindsight expert selection mechanism, which guarantees continuity in the density field even near the experts' boundaries, and demonstrates the superiority of Gumbel-NeRF over the baselines in terms of various image quality metrics.

Abstract

We propose Gumbel-NeRF, a mixture-of-expert (MoE) neural radiance fields (NeRF) model with a hindsight expert selection mechanism for synthesizing novel views of unseen objects. Previous studies have shown that the MoE structure provides high-quality representations of a given large-scale scene consisting of many objects. However, we observe that such a MoE NeRF model often produces low-quality representations in the vicinity of experts' boundaries when applied to the task of novel view synthesis of an unseen object from one/few-shot input. We find that this deterioration is primarily caused by the foresight expert selection mechanism, which may leave an unnatural discontinuity in the object shape near the experts' boundaries. Gumbel-NeRF adopts a hindsight expert selection mechanism, which guarantees continuity in the density field even near the experts' boundaries. Experiments using the SRN cars dataset demonstrate the superiority of Gumbel-NeRF over the baselines in terms of various image quality metrics.

Paper Structure

This paper contains 16 sections, 13 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Overview of Gumbel-NeRF. In the forward pass, a set of experts are processed to return densities and radiances. Out of $N$ experts, only one expert with the highest density is selected. This maximum-pooling expert selection guarantees continuity in the final density field, like the original NeRF. Each expert is associated with an expert-specific latent code so that the expert learn to model a part of the object.
  • Figure 2: (a) Architecture of Gumbel-NeRF. Trainable parameters, output vectors, and our proposed expert selection layer are shown in green, yellow, and pink boxes, respectively. FC refers to fully connected layer and PE refers to positional encoding. $\textbf{z}^s_n$ denotes the expert-specific shape code for the $n$ -th expert. Different from Switch-NeRFzhenxing2022switch, Gumbel-NeRF processes all experts in parallel to produce candidate densities $\sigma_{1\cdots N}$ and intermediate features $\textbf{h}^L_{1\cdots N}$. The layer $G$ samples the output from only one expert using the Gumbel-Max trick. (b) Scheduling of the temperature parameter. The temperature parameter used in the Gumbel-Max trick is scheduled to control the level of randomness through the training process. In the early stage, the temperature is set high so that all experts have a nearly equal chance of being selected (rival stage). In this stage, each expert obtains sufficient gradient updates, avoiding collapse (i.e., a vicious cycle where only one expert obtains all the gradient updates and other experts are underoptimized). Toward the end, the temperature is decreased to make experts distinct (expert stage).
  • Figure 3: Qualitative results of novel view synthesis of unseen objects using one-shot test-time optimization. Compared to CodeNeRF (CN) and Coded Switch-NeRF (CSN), our Gumbel-NeRF (GN-C) generally produces higher quality, especially for those parts marked by red boxes.
  • Figure 4: Visualization of the decomposition provided by Coded Switch-NeRF (CSN) and Gumbel-NeRF (GN). Images in each column are rendered from only the 3D points handled by the corresponding expert.