Table of Contents
Fetching ...

AlphaTablets: A Generic Plane Representation for 3D Planar Reconstruction from Monocular Videos

Yuze He, Wang Zhao, Shaohui Liu, Yubin Hu, Yushi Bai, Yu-Hui Wen, Yong-Jin Liu

TL;DR

The paper tackles the challenge of reconstructing complete, accurate 3D planar surfaces from monocular video. It introduces AlphaTablets, a generic 3D plane representation that encodes planes as rectangles with learnable alpha channels to capture both solid surfaces and irregular boundaries, coupled with differentiable rasterization for image formation. A bottom-up pipeline initializes many small AlphaTablets from 2D superpixels and monocular cues, then jointly optimizes geometry, texture, and alpha via rendering-based losses and a hierarchical merging scheme to form larger planes. Extensive experiments on ScanNet show state-of-the-art 3D planar reconstruction and meaningful plane-based scene editing, highlighting AlphaTablets’ potential as a versatile 3D plane representation for downstream tasks in vision and graphics.

Abstract

We introduce AlphaTablets, a novel and generic representation of 3D planes that features continuous 3D surface and precise boundary delineation. By representing 3D planes as rectangles with alpha channels, AlphaTablets combine the advantages of current 2D and 3D plane representations, enabling accurate, consistent and flexible modeling of 3D planes. We derive differentiable rasterization on top of AlphaTablets to efficiently render 3D planes into images, and propose a novel bottom-up pipeline for 3D planar reconstruction from monocular videos. Starting with 2D superpixels and geometric cues from pre-trained models, we initialize 3D planes as AlphaTablets and optimize them via differentiable rendering. An effective merging scheme is introduced to facilitate the growth and refinement of AlphaTablets. Through iterative optimization and merging, we reconstruct complete and accurate 3D planes with solid surfaces and clear boundaries. Extensive experiments on the ScanNet dataset demonstrate state-of-the-art performance in 3D planar reconstruction, underscoring the great potential of AlphaTablets as a generic 3D plane representation for various applications. Project page is available at: https://hyzcluster.github.io/alphatablets

AlphaTablets: A Generic Plane Representation for 3D Planar Reconstruction from Monocular Videos

TL;DR

The paper tackles the challenge of reconstructing complete, accurate 3D planar surfaces from monocular video. It introduces AlphaTablets, a generic 3D plane representation that encodes planes as rectangles with learnable alpha channels to capture both solid surfaces and irregular boundaries, coupled with differentiable rasterization for image formation. A bottom-up pipeline initializes many small AlphaTablets from 2D superpixels and monocular cues, then jointly optimizes geometry, texture, and alpha via rendering-based losses and a hierarchical merging scheme to form larger planes. Extensive experiments on ScanNet show state-of-the-art 3D planar reconstruction and meaningful plane-based scene editing, highlighting AlphaTablets’ potential as a versatile 3D plane representation for downstream tasks in vision and graphics.

Abstract

We introduce AlphaTablets, a novel and generic representation of 3D planes that features continuous 3D surface and precise boundary delineation. By representing 3D planes as rectangles with alpha channels, AlphaTablets combine the advantages of current 2D and 3D plane representations, enabling accurate, consistent and flexible modeling of 3D planes. We derive differentiable rasterization on top of AlphaTablets to efficiently render 3D planes into images, and propose a novel bottom-up pipeline for 3D planar reconstruction from monocular videos. Starting with 2D superpixels and geometric cues from pre-trained models, we initialize 3D planes as AlphaTablets and optimize them via differentiable rendering. An effective merging scheme is introduced to facilitate the growth and refinement of AlphaTablets. Through iterative optimization and merging, we reconstruct complete and accurate 3D planes with solid surfaces and clear boundaries. Extensive experiments on the ScanNet dataset demonstrate state-of-the-art performance in 3D planar reconstruction, underscoring the great potential of AlphaTablets as a generic 3D plane representation for various applications. Project page is available at: https://hyzcluster.github.io/alphatablets

Paper Structure

This paper contains 18 sections, 9 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Illustration of tablet properties and rendering. Normal and up vector determines the rotation of a tablet in 3D space, while every tablet maintains a distance ratio between the coordinates of the 3D field and 2D-pixel space.
  • Figure 2: Pipeline of our proposed 3D planar reconstruction. Given a monocular video as input, we first initialize AlphaTablets using off-the-shelf superpixel, depth, and normal estimation models. The 3D AlphaTablets are then optimized through photometric guidance, followed by the merging scheme. This iterative process of optimization and merging refines the 3D AlphaTablets, resulting in accurate and complete 3D planar reconstruction.
  • Figure 3: Qualitative results on ScanNet. Error maps are included. Better viewed when zoomed in.
  • Figure 4: Qualitative results on TUM-RGBD and Replica datasets.
  • Figure 5: 3D scene editing examples of our method.
  • ...and 6 more figures