Adjustable Visual Appearance for Generalizable Novel View Synthesis
Josef Bengtson, David Nilsson, Che-Tsung Lin, Marcel Büsching, Fredrik Kahl
TL;DR
We address the problem of generalizable novel view synthesis with controllable appearance by extending a pretrained generalizable NeRF Transformer (GNT) with a latent appearance variable $z_{c'}$ and an appearance-alignment objective. The method renders 3D-consistent novel views of unseen scenes while allowing appearance changes to match a target weather or lighting condition, and supports smooth interpolation in the appearance space. Key contributions include (i) a latent appearance conditioned rendering pipeline, (ii) a dedicated appearance loss that aligns renderings to target conditions, and (iii) a synthetic CARLA-based dataset with four appearance conditions for training and evaluation, plus demonstrations on real data (Spaces). Empirically, the approach outperforms 2D style transfer baselines and Instruct-NeRF2NeRF in terms of rendering quality and temporal/multi-view consistency, while enabling appearance edits without scene-specific training and with fewer input images, highlighting its practical utility for cross-scene appearance editing in VR/AR pipelines.
Abstract
We present a generalizable novel view synthesis method which enables modifying the visual appearance of an observed scene so rendered views match a target weather or lighting condition without any scene specific training or access to reference views at the target condition. Our method is based on a pretrained generalizable transformer architecture and is fine-tuned on synthetically generated scenes under different appearance conditions. This allows for rendering novel views in a consistent manner for 3D scenes that were not included in the training set, along with the ability to (i) modify their appearance to match the target condition and (ii) smoothly interpolate between different conditions. Experiments on real and synthetic scenes show that our method is able to generate 3D consistent renderings while making realistic appearance changes, including qualitative and quantitative comparisons. Please refer to our project page for video results: https://ava-nvs.github.io/
