Table of Contents
Fetching ...

Hybrid bundle-adjusting 3D Gaussians for view consistent rendering with pose optimization

Yanan Guo, Ying Xie, Ying Chang, Benkui Zhang, Bo Jia, Lin Cao

TL;DR

A hybrid bundle-adjusting 3D Gaussians model is introduced that enables view-consistent rendering with pose optimization and can effectively optimize neural scene representations while simultaneously resolving significant camera pose misalignments.

Abstract

Novel view synthesis has made significant progress in the field of 3D computer vision. However, the rendering of view-consistent novel views from imperfect camera poses remains challenging. In this paper, we introduce a hybrid bundle-adjusting 3D Gaussians model that enables view-consistent rendering with pose optimization. This model jointly extract image-based and neural 3D representations to simultaneously generate view-consistent images and camera poses within forward-facing scenes. The effective of our model is demonstrated through extensive experiments conducted on both real and synthetic datasets. These experiments clearly illustrate that our model can effectively optimize neural scene representations while simultaneously resolving significant camera pose misalignments. The source code is available at https://github.com/Bistu3DV/hybridBA.

Hybrid bundle-adjusting 3D Gaussians for view consistent rendering with pose optimization

TL;DR

A hybrid bundle-adjusting 3D Gaussians model is introduced that enables view-consistent rendering with pose optimization and can effectively optimize neural scene representations while simultaneously resolving significant camera pose misalignments.

Abstract

Novel view synthesis has made significant progress in the field of 3D computer vision. However, the rendering of view-consistent novel views from imperfect camera poses remains challenging. In this paper, we introduce a hybrid bundle-adjusting 3D Gaussians model that enables view-consistent rendering with pose optimization. This model jointly extract image-based and neural 3D representations to simultaneously generate view-consistent images and camera poses within forward-facing scenes. The effective of our model is demonstrated through extensive experiments conducted on both real and synthetic datasets. These experiments clearly illustrate that our model can effectively optimize neural scene representations while simultaneously resolving significant camera pose misalignments. The source code is available at https://github.com/Bistu3DV/hybridBA.

Paper Structure

This paper contains 22 sections, 10 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of the method. We first input a set of nearby views to extract pixel-level features for each image as well as key anchor point features in the scene. Next, we accurately match these image features with the 3D anchor points generated through the point cloud, which in turn enables the fusion of the image features with the anchor point features. In order to improve the accuracy, we correct the camera's position through joint optimisation, which ensures the accuracy of the camera parameters. Finally, we render the corrected data using 3DGS to synthesise novel views with high quality.
  • Figure 2: A number of representative input images of the glasses case have been selected for presentation. With these images, it can be observed that the input images not only contain a complete view of the glasses case, but also cover three different perspectives of its background: top, middle and bottom, thus providing a full range of information.
  • Figure 3: We selected two scenes from the Tanks & Temples dataset (train and family), one scene from the CO3D dataset (hydrant), and one scene from the self-made dataset (Glasses case) to conduct experiments and validate the validity and generalization of our method.
  • Figure 4: We show the results of our method to synthesise camera poses in four scenes. The first row is the camera pose by our method without bundle-adjusting; the second row is the camera pose by our method; and the third row is the camera pose of the ground truth(gt) images.