Table of Contents
Fetching ...

MAtCha Gaussians: Atlas of Charts for High-Quality Geometry and Photorealism From Sparse Views

Antoine Guédon, Tomoki Ichikawa, Kohei Yamashita, Ko Nishino

TL;DR

MAtCha Gaussians introduces a mesh-as-atlas representation where scene geometry is modeled as a collection of 2D charts, initialized from monocular depth and refined via a lightweight neural deformation model, with differentiable Gaussian surfel rendering to achieve photorealistic novel views from sparse images. The method explicitly optimizes geometry in 2D chart space, aligns charts with SfM points, and refines the surface with on-the-fly Gaussian surfels, enabling sharp mesh recovery for both foreground and background in unbounded scenes. It also presents two mesh extraction strategies—multi-resolution TSDF fusion and adaptive tetrahedralization—designed to preserve fine geometry without the distortions typical of volumetric approaches. Across bounded and unbounded datasets, MAtCha achieves state-of-the-art surface reconstruction quality and competitive or superior photorealistic rendering from very sparse input views, while dramatically reducing training time. The work offers a practical, scalable tool for applications in vision, graphics, and robotics that require explicit geometry alongside photorealism.

Abstract

We present a novel appearance model that simultaneously realizes explicit high-quality 3D surface mesh recovery and photorealistic novel view synthesis from sparse view samples. Our key idea is to model the underlying scene geometry Mesh as an Atlas of Charts which we render with 2D Gaussian surfels (MAtCha Gaussians). MAtCha distills high-frequency scene surface details from an off-the-shelf monocular depth estimator and refines it through Gaussian surfel rendering. The Gaussian surfels are attached to the charts on the fly, satisfying photorealism of neural volumetric rendering and crisp geometry of a mesh model, i.e., two seemingly contradicting goals in a single model. At the core of MAtCha lies a novel neural deformation model and a structure loss that preserve the fine surface details distilled from learned monocular depths while addressing their fundamental scale ambiguities. Results of extensive experimental validation demonstrate MAtCha's state-of-the-art quality of surface reconstruction and photorealism on-par with top contenders but with dramatic reduction in the number of input views and computational time. We believe MAtCha will serve as a foundational tool for any visual application in vision, graphics, and robotics that require explicit geometry in addition to photorealism. Our project page is the following: https://anttwo.github.io/matcha/

MAtCha Gaussians: Atlas of Charts for High-Quality Geometry and Photorealism From Sparse Views

TL;DR

MAtCha Gaussians introduces a mesh-as-atlas representation where scene geometry is modeled as a collection of 2D charts, initialized from monocular depth and refined via a lightweight neural deformation model, with differentiable Gaussian surfel rendering to achieve photorealistic novel views from sparse images. The method explicitly optimizes geometry in 2D chart space, aligns charts with SfM points, and refines the surface with on-the-fly Gaussian surfels, enabling sharp mesh recovery for both foreground and background in unbounded scenes. It also presents two mesh extraction strategies—multi-resolution TSDF fusion and adaptive tetrahedralization—designed to preserve fine geometry without the distortions typical of volumetric approaches. Across bounded and unbounded datasets, MAtCha achieves state-of-the-art surface reconstruction quality and competitive or superior photorealistic rendering from very sparse input views, while dramatically reducing training time. The work offers a practical, scalable tool for applications in vision, graphics, and robotics that require explicit geometry alongside photorealism.

Abstract

We present a novel appearance model that simultaneously realizes explicit high-quality 3D surface mesh recovery and photorealistic novel view synthesis from sparse view samples. Our key idea is to model the underlying scene geometry Mesh as an Atlas of Charts which we render with 2D Gaussian surfels (MAtCha Gaussians). MAtCha distills high-frequency scene surface details from an off-the-shelf monocular depth estimator and refines it through Gaussian surfel rendering. The Gaussian surfels are attached to the charts on the fly, satisfying photorealism of neural volumetric rendering and crisp geometry of a mesh model, i.e., two seemingly contradicting goals in a single model. At the core of MAtCha lies a novel neural deformation model and a structure loss that preserve the fine surface details distilled from learned monocular depths while addressing their fundamental scale ambiguities. Results of extensive experimental validation demonstrate MAtCha's state-of-the-art quality of surface reconstruction and photorealism on-par with top contenders but with dramatic reduction in the number of input views and computational time. We believe MAtCha will serve as a foundational tool for any visual application in vision, graphics, and robotics that require explicit geometry in addition to photorealism. Our project page is the following: https://anttwo.github.io/matcha/

Paper Structure

This paper contains 41 sections, 12 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: We propose MAtCha Gaussians, a novel surface representation for reconstructing high-quality 3D meshes with photorealistic rendering from sparse-view images. Our key idea is to model the underlying scene geometry as an Atlas of Charts in 2D image planes, which we render with 2D Gaussian surfels. We initialize the charts with a monocular depth estimation model and refine them using differentiable Gaussian rendering and a lightweight neural chart deformation model. Combined with a sparse-view SfM model like MASt3R-SfM duisterhof2024mast3rsfm, MAtCha can recover sharp and accurate surface meshes of both foreground and background objects in unbounded scenes within minutes, from a few unposed RGB images. We used 3 views for training for the left most, and 10 views for the rest.
  • Figure 2: Overview of MAtCha Gaussians. Given a few RGB images and their camera poses obtained using a sparse-view SfM method such as MASt3R-SfM duisterhof2024mast3rsfm, we first initialize charts using a pretrained monocular depth estimation model. Each chart is represented as a mesh equipped with a UV map, mapping a 2D plane to the 3D surface. We then optimize our charts and enforce their alignment with input SfM data using two key components: (1) 1D depth encodings for quickly aligning the initial depth maps together, and (2) charts encodings for efficiently deforming the geometry while preserving surface details. Our aligned charts provide a sharp, dense and accurate estimate of the 3D scene, which can be further refined using input images and a Gaussian Splatting-based rendering pipeline. Our representation allows for reconstructing high-quality surface meshes within minutes, even in sparse-view scenarios.
  • Figure 3: Reconstruction with different numbers of input views. Our method can produce high-quality renderings (top) and surfaces (bottom) even with very sparse input views (3-10 views). The quality of our meshes is visually pleasing even in extreme sparse scenarios.
  • Figure 4: Comparison between our two different mesh extraction methods: Multi-resolution TSDF fusion (left), and Adaptive tetrahedralization (right). We optimized MAtCha Gaussians representations with only 10 training images. Contrary to vanilla TSDF fusion, our multi-resolution TSDF can reconstruct both foreground and background objects with a decent number of vertices. However, similarly to vanilla TSDF fusion, it produces eroded meshes with holes in the surface, as well as "disk-aliasing" artifacts. On the contrary, our adaptive tetrahedralization inspired by GOF yu2024gaussian is able to reconstruct accurate and complete surfaces meshes (see top right image), with sharp and fine details (see bottom right image).
  • Figure 5: Comparisons with Spurfies raj2024spurfies and MVSplat chen2024mvsplat on an unbounded scene. Our method outperforms state-of-the-art approaches for surface reconstruction and feed-forward Gaussian splatting regression in sparse view scenarios.
  • ...and 3 more figures