Table of Contents
Fetching ...

DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly

Gianluca Scarpellini, Stefano Fiorini, Francesco Giuliari, Pietro Morerio, Alessio Del Bue

TL;DR

DiffAssemble, a Graph Neural Network (GNN)-based architecture that learns to solve reassembly tasks using a diffusion model formulation, is introduced, a Graph Neural Network (GNN)-based architecture that learns to solve reassembly tasks using a diffusion model formulation.

Abstract

Reassembly tasks play a fundamental role in many fields and multiple approaches exist to solve specific reassembly problems. In this context, we posit that a general unified model can effectively address them all, irrespective of the input data type (images, 3D, etc.). We introduce DiffAssemble, a Graph Neural Network (GNN)-based architecture that learns to solve reassembly tasks using a diffusion model formulation. Our method treats the elements of a set, whether pieces of 2D patch or 3D object fragments, as nodes of a spatial graph. Training is performed by introducing noise into the position and rotation of the elements and iteratively denoising them to reconstruct the coherent initial pose. DiffAssemble achieves state-of-the-art (SOTA) results in most 2D and 3D reassembly tasks and is the first learning-based approach that solves 2D puzzles for both rotation and translation. Furthermore, we highlight its remarkable reduction in run-time, performing 11 times faster than the quickest optimization-based method for puzzle solving. Code available at https://github.com/IIT-PAVIS/DiffAssemble

DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly

TL;DR

DiffAssemble, a Graph Neural Network (GNN)-based architecture that learns to solve reassembly tasks using a diffusion model formulation, is introduced, a Graph Neural Network (GNN)-based architecture that learns to solve reassembly tasks using a diffusion model formulation.

Abstract

Reassembly tasks play a fundamental role in many fields and multiple approaches exist to solve specific reassembly problems. In this context, we posit that a general unified model can effectively address them all, irrespective of the input data type (images, 3D, etc.). We introduce DiffAssemble, a Graph Neural Network (GNN)-based architecture that learns to solve reassembly tasks using a diffusion model formulation. Our method treats the elements of a set, whether pieces of 2D patch or 3D object fragments, as nodes of a spatial graph. Training is performed by introducing noise into the position and rotation of the elements and iteratively denoising them to reconstruct the coherent initial pose. DiffAssemble achieves state-of-the-art (SOTA) results in most 2D and 3D reassembly tasks and is the first learning-based approach that solves 2D puzzles for both rotation and translation. Furthermore, we highlight its remarkable reduction in run-time, performing 11 times faster than the quickest optimization-based method for puzzle solving. Code available at https://github.com/IIT-PAVIS/DiffAssemble
Paper Structure (40 sections, 8 equations, 8 figures, 7 tables)

This paper contains 40 sections, 8 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: We propose DiffAssemble as a unified approach to deal with reassembly tasks in two and three dimensions. DiffAssemble processes the elements to reassemble as a graph and infer their correct position and orientation in 2D and 3D space.
  • Figure 2: Framework of our proposed DiffAssemble for reassembly tasks, here is shown for the 3D task. Following the Diffusion Probabilistic Models formulations, we model a Markov chain where we inject noise into the pieces' position and orientation. At timestep $t = 0$, the pieces are in their correct position, and at timestep $t = T$, they are in a random position with random orientation. At each timestep $t$, our attention-based GNN takes as input a graph where each node contains an equivariant feature that describes a particular piece and its position and orientation. The network then predicts a less noisy version of the piece's position and orientation.
  • Figure 3: Qualitative results on Breaking Bad, showing the reassembly results for a broken wine glass and a wine bottle. We compare the results against SE(3)-Equiv wu2023leveraging, which is the current SOTA method. All results are in the same reference frame, shifted horizontally so they do not overlap. We show the results with glass materials to better show overlapping pieces.
  • Figure 4: (a) Patch alignment in inference from $t=T$ to $t=0$. (b) Qualitative comparison with $30\%$ missing pieces.
  • Figure 5: GPU memory consumption by total size.
  • ...and 3 more figures