Table of Contents
Fetching ...

Latent Radiance Fields with 3D-aware 2D Representations

Chaoyi Zhou, Xi Liu, Feng Luo, Siyu Huang

TL;DR

This is the first work showing the radiance field representations constructed from 2D latent representations can yield photorealistic 3D reconstruction performance, and is the first work showing the radiance field representations constructed from 2D latent representations can yield photorealistic 3D reconstruction performance.

Abstract

Latent 3D reconstruction has shown great promise in empowering 3D semantic understanding and 3D generation by distilling 2D features into the 3D space. However, existing approaches struggle with the domain gap between 2D feature space and 3D representations, resulting in degraded rendering performance. To address this challenge, we propose a novel framework that integrates 3D awareness into the 2D latent space. The framework consists of three stages: (1) a correspondence-aware autoencoding method that enhances the 3D consistency of 2D latent representations, (2) a latent radiance field (LRF) that lifts these 3D-aware 2D representations into 3D space, and (3) a VAE-Radiance Field (VAE-RF) alignment strategy that improves image decoding from the rendered 2D representations. Extensive experiments demonstrate that our method outperforms the state-of-the-art latent 3D reconstruction approaches in terms of synthesis performance and cross-dataset generalizability across diverse indoor and outdoor scenes. To our knowledge, this is the first work showing the radiance field representations constructed from 2D latent representations can yield photorealistic 3D reconstruction performance.

Latent Radiance Fields with 3D-aware 2D Representations

TL;DR

This is the first work showing the radiance field representations constructed from 2D latent representations can yield photorealistic 3D reconstruction performance, and is the first work showing the radiance field representations constructed from 2D latent representations can yield photorealistic 3D reconstruction performance.

Abstract

Latent 3D reconstruction has shown great promise in empowering 3D semantic understanding and 3D generation by distilling 2D features into the 3D space. However, existing approaches struggle with the domain gap between 2D feature space and 3D representations, resulting in degraded rendering performance. To address this challenge, we propose a novel framework that integrates 3D awareness into the 2D latent space. The framework consists of three stages: (1) a correspondence-aware autoencoding method that enhances the 3D consistency of 2D latent representations, (2) a latent radiance field (LRF) that lifts these 3D-aware 2D representations into 3D space, and (3) a VAE-Radiance Field (VAE-RF) alignment strategy that improves image decoding from the rendered 2D representations. Extensive experiments demonstrate that our method outperforms the state-of-the-art latent 3D reconstruction approaches in terms of synthesis performance and cross-dataset generalizability across diverse indoor and outdoor scenes. To our knowledge, this is the first work showing the radiance field representations constructed from 2D latent representations can yield photorealistic 3D reconstruction performance.

Paper Structure

This paper contains 20 sections, 12 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: This work novelly enables the radiance field representations on the latent space of VAE, achieving photorealistic 3D reconstruction performance on unbounded outdoor scenes.
  • Figure 2: An illustration of our pipeline for creating a latent radiance field in conjunction with 3D-aware 2D representation fine-tuning. Firstly in Stage-I, we inject 3D awareness into the VAE’s encoder through applying a novel correspondence consistency constraint on the latent space, making the 2D representations follow the geometry consistency. Then in Stage-II, we create the latent radiance field (LRF) to represent 3D scenes based on the 3D-aware 2D representations. Finally in Stage-III, we introduce a VAE-Radiance Field alignment method to enhance the performance of image decoding from the rendered latent space.
  • Figure 3: A visualization of latent spaces of original and our fine-tuned VAEs. Our method ensures an accurate geometry in the latent space while removing a certain amount of high-frequency noises.
  • Figure 4: A visual comparison of rendering results. Our method can not only render high-quality images for in-distribution dataset (DL3DV-10K), but also shows great generalization ability across different datasets.
  • Figure 5: Visual comparison of different text-to-3D generation methods. Our model enables the generation of more view-consistent results.
  • ...and 7 more figures