Table of Contents
Fetching ...

Complet4R: Geometric Complete 4D Reconstruction

Weibang Wang, Kenan Li, Zhuoguang Chen, Yijun Yuan, Hang Zhao

Abstract

We introduce Complet4R, a novel end-to-end framework for Geometric Complete 4D Reconstruction, which aims to recover temporally coherent and geometrically complete reconstruction for dynamic scenes. Our method formalizes the task of Geometric Complete 4D Reconstruction as a unified framework of reconstruction and completion, by directly accumulating full contexts onto each frame. Unlike previous approaches that rely on pairwise reconstruction or local motion estimation, Complet4R utilizes a decoder-only transformer to operate all context globally directly from sequential video input, reconstructing a complete geometry for every single timestamp, including occluded regions visible in other frames. Our method demonstrates the state-of-the-art performance on our proposed benchmark for Geometric Complete 4D Reconstruction and the 3D Point Tracking task. Code will be released to support future research.

Complet4R: Geometric Complete 4D Reconstruction

Abstract

We introduce Complet4R, a novel end-to-end framework for Geometric Complete 4D Reconstruction, which aims to recover temporally coherent and geometrically complete reconstruction for dynamic scenes. Our method formalizes the task of Geometric Complete 4D Reconstruction as a unified framework of reconstruction and completion, by directly accumulating full contexts onto each frame. Unlike previous approaches that rely on pairwise reconstruction or local motion estimation, Complet4R utilizes a decoder-only transformer to operate all context globally directly from sequential video input, reconstructing a complete geometry for every single timestamp, including occluded regions visible in other frames. Our method demonstrates the state-of-the-art performance on our proposed benchmark for Geometric Complete 4D Reconstruction and the 3D Point Tracking task. Code will be released to support future research.

Paper Structure

This paper contains 41 sections, 14 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Complete and Consistent 4D Reconstruction. Our model, Complet4R, aggregates 3D point maps from all frames into a specific timestamp, forming a complete geometric representation that recovers occluded regions visible from other views. By alternating the aggregated timestamp, our method achieves complete and consistent 4D reconstruction, producing temporally coherent and geometrically complete representations directly from sequential video input.
  • Figure 2: Geometric 4D complete reconstruction from observations. Given input frames, Complet4R aggregates contextual information across all timestamps. Consequently, at each timestamp $T$, the reconstructed scene incorporates the geometry from frame $T$ along with complementary information from all other frames.
  • Figure 3: Architecture Overview. By concatenating special aggregation tokens, Complet4R identifies the specific timestamp for aggregation. The Aggregation head then outputs the positions of 3D points from other views at this timestamp, aggregating 3D point maps across frames to form a complete geometric representation.
  • Figure 4: Qualitative Results for 4D Complete Reconstruction. The first column shows the video inputs, with red boxes indicating the target aggregation timestamp for each sequence (Agg.: aggregation). The subsequent columns present the outputs of different models. Our method successfully reconstructs the complete geometry at the target timestamp highlighted by the red ellipses, whereas other methods produce incomplete or geometrically inconsistent reconstructions.
  • Figure 5: Qualitative Results for 3D Dynamic Point Tracking. The first column shows the input images; the second and third columns display the tracking trajectories produced by our method at successive time steps. The smooth trajectories demonstrate strong spatiotemporal geometric consistency.
  • ...and 3 more figures