Table of Contents
Fetching ...

A self-supervised cyclic neural-analytic approach for novel view synthesis and 3D reconstruction

Dragos Costea, Alina Marcu, Marius Leordeanu

TL;DR

The paper addresses the challenge of robust novel view synthesis and 3D reconstruction for UAV-driven outdoor scenes under sparse and noisy data. It introduces a self-supervised cyclic neural-analytic pipeline that fuses analytic structure-from-motion reconstructions with neural rendering, refined by a transformer-based image restoration module, and iteratively improves results without labeled data. Key contributions include a dual analytic-neural reconstruction framework, a cyclic refinement mechanism, and demonstrations that the approach yields improved RGB views and meshes across diverse, distant test poses, outperforming several state-of-the-art baselines. This method offers a scalable, scene-adaptive solution with strong potential for autonomous navigation, virtual/augmented reality, and robotic vision, while acknowledging computational overhead and opportunities for faster execution.

Abstract

Generating novel views from recorded videos is crucial for enabling autonomous UAV navigation. Recent advancements in neural rendering have facilitated the rapid development of methods capable of rendering new trajectories. However, these methods often fail to generalize well to regions far from the training data without an optimized flight path, leading to suboptimal reconstructions. We propose a self-supervised cyclic neural-analytic pipeline that combines high-quality neural rendering outputs with precise geometric insights from analytical methods. Our solution improves RGB and mesh reconstructions for novel view synthesis, especially in undersampled areas and regions that are completely different from the training dataset. We use an effective transformer-based architecture for image reconstruction to refine and adapt the synthesis process, enabling effective handling of novel, unseen poses without relying on extensive labeled datasets. Our findings demonstrate substantial improvements in rendering views of novel and also 3D reconstruction, which to the best of our knowledge is a first, setting a new standard for autonomous navigation in complex outdoor environments.

A self-supervised cyclic neural-analytic approach for novel view synthesis and 3D reconstruction

TL;DR

The paper addresses the challenge of robust novel view synthesis and 3D reconstruction for UAV-driven outdoor scenes under sparse and noisy data. It introduces a self-supervised cyclic neural-analytic pipeline that fuses analytic structure-from-motion reconstructions with neural rendering, refined by a transformer-based image restoration module, and iteratively improves results without labeled data. Key contributions include a dual analytic-neural reconstruction framework, a cyclic refinement mechanism, and demonstrations that the approach yields improved RGB views and meshes across diverse, distant test poses, outperforming several state-of-the-art baselines. This method offers a scalable, scene-adaptive solution with strong potential for autonomous navigation, virtual/augmented reality, and robotic vision, while acknowledging computational overhead and opportunities for faster execution.

Abstract

Generating novel views from recorded videos is crucial for enabling autonomous UAV navigation. Recent advancements in neural rendering have facilitated the rapid development of methods capable of rendering new trajectories. However, these methods often fail to generalize well to regions far from the training data without an optimized flight path, leading to suboptimal reconstructions. We propose a self-supervised cyclic neural-analytic pipeline that combines high-quality neural rendering outputs with precise geometric insights from analytical methods. Our solution improves RGB and mesh reconstructions for novel view synthesis, especially in undersampled areas and regions that are completely different from the training dataset. We use an effective transformer-based architecture for image reconstruction to refine and adapt the synthesis process, enabling effective handling of novel, unseen poses without relying on extensive labeled datasets. Our findings demonstrate substantial improvements in rendering views of novel and also 3D reconstruction, which to the best of our knowledge is a first, setting a new standard for autonomous navigation in complex outdoor environments.

Paper Structure

This paper contains 12 sections, 2 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: An overview of our novel self-supervised cyclic neural-analytic pipeline for novel 2D view synthesis. We rely on both traditional and modern 3D reconstruction methods which we combine through a self-supervised transformer-based U-net style model for improved image reconstruction. We employ an iterative learning procedure in which the outputs from the first learning iteration become inputs for the next to further refine the results in terms of RGB and mesh, without additional new images. We work in the UAV video domain and use the last 20% from the image sequence as testing to simulate a more realistic reconstruction scenario. Original RGB frames from the TEST set are used exclusively for evaluation purposes.
  • Figure 2: PSNR error on Slanic on each test frame. CNA consistently improves over the other baselines, despite not receiving additional RGB images. The performance gains are not only average, but on the vast majority of frames. Full results are shown in the supplementary material.
  • Figure 3: Mesh reconstruction improvement evaluation for different scenes - training vs training and generated set. We first obtain a ground truth mesh from the combined training an testing set. We align and compare it with (a) the one featuring only training images and (b) training images with generated test images (no additional information). A histogram over the error bins is shown. Our method (green) consistently improves over the mesh generated with training images only(red), significantly reducing the large 3D reconstruction errors.
  • Figure 4: Qualitative results. From left to right, RGB, Gaussian Splatting, CNA, CNA vs Gaussian Splatting error. Green means CNA has a smaller error. The errors from Gaussian Splatting are harder to spot due to the blending, but poorly seen regions and artifacts are better dealt with using CNA.