Table of Contents
Fetching ...

PuzzleFusion++: Auto-agglomerative 3D Fracture Assembly by Denoise and Verify

Zhengqing Wang, Jiacheng Chen, Yasutaka Furukawa

TL;DR

PuzzleFusion++ addresses 3D fracture assembly by proposing an auto-agglomerative pipeline that iteratively aligns and merges fragments. It combines a diffusion-based $SE(3)$ denoiser with a transformer-based verifier to progressively form larger fragments across up to six iterations, emulating human puzzle-solving strategies. On the Breaking Bad dataset, it achieves state-of-the-art performance, with notable improvements in $PA$ and Chamfer distance, and demonstrates robustness through extensive ablations and qualitative analyses. The work offers a fully neural, end-to-end framework with potential impact on archaeology, forensics, and related fields requiring accurate 3D reconstruction from fragmented objects.

Abstract

This paper proposes a novel "auto-agglomerative" 3D fracture assembly method, PuzzleFusion++, resembling how humans solve challenging spatial puzzles. Starting from individual fragments, the approach 1) aligns and merges fragments into larger groups akin to agglomerative clustering and 2) repeats the process iteratively in completing the assembly akin to auto-regressive methods. Concretely, a diffusion model denoises the 6-DoF alignment parameters of the fragments simultaneously, and a transformer model verifies and merges pairwise alignments into larger ones, whose process repeats iteratively. Extensive experiments on the Breaking Bad dataset show that PuzzleFusion++ outperforms all other state-of-the-art techniques by significant margins across all metrics, in particular by over 10% in part accuracy and 50% in Chamfer distance. The code will be available on our project page: https://puzzlefusion-plusplus.github.io.

PuzzleFusion++: Auto-agglomerative 3D Fracture Assembly by Denoise and Verify

TL;DR

PuzzleFusion++ addresses 3D fracture assembly by proposing an auto-agglomerative pipeline that iteratively aligns and merges fragments. It combines a diffusion-based denoiser with a transformer-based verifier to progressively form larger fragments across up to six iterations, emulating human puzzle-solving strategies. On the Breaking Bad dataset, it achieves state-of-the-art performance, with notable improvements in and Chamfer distance, and demonstrates robustness through extensive ablations and qualitative analyses. The work offers a fully neural, end-to-end framework with potential impact on archaeology, forensics, and related fields requiring accurate 3D reconstruction from fragmented objects.

Abstract

This paper proposes a novel "auto-agglomerative" 3D fracture assembly method, PuzzleFusion++, resembling how humans solve challenging spatial puzzles. Starting from individual fragments, the approach 1) aligns and merges fragments into larger groups akin to agglomerative clustering and 2) repeats the process iteratively in completing the assembly akin to auto-regressive methods. Concretely, a diffusion model denoises the 6-DoF alignment parameters of the fragments simultaneously, and a transformer model verifies and merges pairwise alignments into larger ones, whose process repeats iteratively. Extensive experiments on the Breaking Bad dataset show that PuzzleFusion++ outperforms all other state-of-the-art techniques by significant margins across all metrics, in particular by over 10% in part accuracy and 50% in Chamfer distance. The code will be available on our project page: https://puzzlefusion-plusplus.github.io.
Paper Structure (21 sections, 3 equations, 17 figures, 5 tables)

This paper contains 21 sections, 3 equations, 17 figures, 5 tables.

Figures (17)

  • Figure 1: PuzzleFusion++ iteratively aligns and assembles fracture fragments into a 3D shape, resembling how humans solve jigsaw puzzles. At each iteration, a diffusion model solves for the 6-DoF alignments of the fragments, and a transformer verifies the pairwise alignments and merge them into larger fragments. We call our approach "auto-agglomerative," referring to auto-regressive methods for the iterative process and agglomeration clustering for the hierarchical grouping.
  • Figure 2: The architecture overview of PuzzleFusion++ (mesh used only for visualization). Left: An illustration of the auto-agglomerative fracture assembly process with the first two iterations. Right: Close-ups of the SE3 denoise transformer and the pairwise alignment verifier transformer. Please refer to \ref{['fig:algorithm']} for the details of the architectures.
  • Figure 3: Inference pipeline with architecture specifications. The denoiser (blue) is a diffusion model with Transformer architecture at its core. The verifier (orange) is a Transformer.
  • Figure 4: Qualitative comparisons on the Breaking Bad dataset (mesh used only for visualization). We transform the assembled objects to same coordinate system and normalize their sizes for clearer visualization. For each result, we show the number of successfully assembled fragments versus the total number of fragments (i.e., the part accuracy metric). Please see the Appendix for additional results.
  • Figure 5: Visualization of the assembly process in the first three auto-agglomerative iterations (mesh used only for visualization). We show two challenging examples with more than 9 fragments. The number of successfully assembled fragments increases as the system runs for more iterations.
  • ...and 12 more figures