Table of Contents
Fetching ...

NeRF Analogies: Example-Based Visual Attribute Transfer for NeRFs

Michael Fischer, Zhengqin Li, Thu Nguyen-Phuoc, Aljaz Bozic, Zhao Dong, Carl Marshall, Tobias Ritschel

TL;DR

This work introduces NeRF analogies, a framework for transferring appearance between NeRFs by leveraging semantic affinity from pretrained ViT features to align source appearance with a target geometry. It computes dense correspondences via DiNO-ViT on renderings and trains a NeRF analogy that combines target geometry with source appearance in a multiview-consistent manner, augmented by an edge-preserving regularizer. Empirically, NeRF analogies outperform traditional stylization and image-analogy baselines and are preferred in user studies, demonstrating robust transfer across real-world and synthetic scenes and across multi-object configurations. The approach enables practical 3D attribute transfer and opens avenues for 3D texture transfer and intrinsic parameter transfer in future work.

Abstract

A Neural Radiance Field (NeRF) encodes the specific relation of 3D geometry and appearance of a scene. We here ask the question whether we can transfer the appearance from a source NeRF onto a target 3D geometry in a semantically meaningful way, such that the resulting new NeRF retains the target geometry but has an appearance that is an analogy to the source NeRF. To this end, we generalize classic image analogies from 2D images to NeRFs. We leverage correspondence transfer along semantic affinity that is driven by semantic features from large, pre-trained 2D image models to achieve multi-view consistent appearance transfer. Our method allows exploring the mix-and-match product space of 3D geometry and appearance. We show that our method outperforms traditional stylization-based methods and that a large majority of users prefer our method over several typical baselines.

NeRF Analogies: Example-Based Visual Attribute Transfer for NeRFs

TL;DR

This work introduces NeRF analogies, a framework for transferring appearance between NeRFs by leveraging semantic affinity from pretrained ViT features to align source appearance with a target geometry. It computes dense correspondences via DiNO-ViT on renderings and trains a NeRF analogy that combines target geometry with source appearance in a multiview-consistent manner, augmented by an edge-preserving regularizer. Empirically, NeRF analogies outperform traditional stylization and image-analogy baselines and are preferred in user studies, demonstrating robust transfer across real-world and synthetic scenes and across multi-object configurations. The approach enables practical 3D attribute transfer and opens avenues for 3D texture transfer and intrinsic parameter transfer in future work.

Abstract

A Neural Radiance Field (NeRF) encodes the specific relation of 3D geometry and appearance of a scene. We here ask the question whether we can transfer the appearance from a source NeRF onto a target 3D geometry in a semantically meaningful way, such that the resulting new NeRF retains the target geometry but has an appearance that is an analogy to the source NeRF. To this end, we generalize classic image analogies from 2D images to NeRFs. We leverage correspondence transfer along semantic affinity that is driven by semantic features from large, pre-trained 2D image models to achieve multi-view consistent appearance transfer. Our method allows exploring the mix-and-match product space of 3D geometry and appearance. We show that our method outperforms traditional stylization-based methods and that a large majority of users prefer our method over several typical baselines.
Paper Structure (22 sections, 5 equations, 14 figures, 1 table)

This paper contains 22 sections, 5 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: The main steps of our approach from left to right: We render both the target and source NeRF (first and second pair of rows) into a set of 2D images (first column), and then extract features (middle column). An image hence is a point cloud in feature space, where every point is labeled by 3D position, normal, view direction and appearance (third column). We use view direction, RGB and features of the source NeRF, and position, normal and features of the target, and gray-out unused channels. We then establish correspondence between the source and target features via the mapping $\phi$ϕ in the lower right subplot, allowing us to transfer appearance from the source to the geometry of the target. Finally, we train our NeRF analogy $L_\TextOrMath{$θ$\xspace}{\theta}$L_$\theta$θ which combines the target's geometry with the appearance from the source.
  • Figure 1: Comparison between DiNO- and SIFT-features.
  • Figure 2: DiNO affinity for various pixel queries (colored dots, columns) on various object pairs (rows), visualized as heatmap where blue and red correspond to 0 and 1, respectively.
  • Figure 2: A semantic transfer between a bowl of apples and a set of tennis balls, both encoded as SDF.
  • Figure 3: Self-similarity for a pixel query (the yellow point on the left image) for several variants of DiNO to illustrate the effects of feature resolution. Our version produces the most fine-granular features, as is visible in the rightmost image.
  • ...and 9 more figures