Table of Contents
Fetching ...

FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos

Florian Langer, Jihong Ju, Georgi Dikov, Gerhard Reitmayr, Mohsen Ghafoorian

TL;DR

This work proposes FastCAD, a real-time method that simultaneously retrieves and aligns CAD models for all objects in a given scene and achieves high-quality shape retrievals by learning CAD embeddings in a contrastive learning framework and distilling those into FastCAD.

Abstract

Digitising the 3D world into a clean, CAD model-based representation has important applications for augmented reality and robotics. Current state-of-the-art methods are computationally intensive as they individually encode each detected object and optimise CAD alignments in a second stage. In this work, we propose FastCAD, a real-time method that simultaneously retrieves and aligns CAD models for all objects in a given scene. In contrast to previous works, we directly predict alignment parameters and shape embeddings. We achieve high-quality shape retrievals by learning CAD embeddings in a contrastive learning framework and distilling those into FastCAD. Our single-stage method accelerates the inference time by a factor of 50 compared to other methods operating on RGB-D scans while outperforming them on the challenging Scan2CAD alignment benchmark. Further, our approach collaborates seamlessly with online 3D reconstruction techniques. This enables the real-time generation of precise CAD model-based reconstructions from videos at 10 FPS. Doing so, we significantly improve the Scan2CAD alignment accuracy in the video setting from 43.0% to 48.2% and the reconstruction accuracy from 22.9% to 29.6%.

FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos

TL;DR

This work proposes FastCAD, a real-time method that simultaneously retrieves and aligns CAD models for all objects in a given scene and achieves high-quality shape retrievals by learning CAD embeddings in a contrastive learning framework and distilling those into FastCAD.

Abstract

Digitising the 3D world into a clean, CAD model-based representation has important applications for augmented reality and robotics. Current state-of-the-art methods are computationally intensive as they individually encode each detected object and optimise CAD alignments in a second stage. In this work, we propose FastCAD, a real-time method that simultaneously retrieves and aligns CAD models for all objects in a given scene. In contrast to previous works, we directly predict alignment parameters and shape embeddings. We achieve high-quality shape retrievals by learning CAD embeddings in a contrastive learning framework and distilling those into FastCAD. Our single-stage method accelerates the inference time by a factor of 50 compared to other methods operating on RGB-D scans while outperforming them on the challenging Scan2CAD alignment benchmark. Further, our approach collaborates seamlessly with online 3D reconstruction techniques. This enables the real-time generation of precise CAD model-based reconstructions from videos at 10 FPS. Doing so, we significantly improve the Scan2CAD alignment accuracy in the video setting from 43.0% to 48.2% and the reconstruction accuracy from 22.9% to 29.6%.
Paper Structure (23 sections, 5 equations, 10 figures, 3 tables)

This paper contains 23 sections, 5 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Intro: FastCAD retrieves and aligns CAD models to point clouds in real-time. FastCAD can either directly operate on an RGB-D scan (left top) or on the output of an off-the-shelf reconstruction method, which takes an RGB video as its input (left bottom). The graph on the right shows the Scan2CAD scan2cad instance alignment accuracy as a function of inference time compared to competing methods. Note that the inference time is displayed on a log scale. Closed circles and stars (ours) denote methods operating on RGB-D scans, while open circles and stars represent methods using RGB videos as inputs. FastCAD outperforms previous methods in both settings while being significantly faster than the previously fastest methods. Note that RayTran raytran did not disclose their run-times but is most likely much slower than FastCAD (see Supp. Mat.).
  • Figure 2: Method. FastCAD retrieves and aligns CAD models for all objects detected in an input point cloud. For all detected objects it predicts their category $\hat{\boldsymbol{p}}$, bounding box parameters $\hat{\boldsymbol{b}}$, front-facing side $\hat{\boldsymbol{f}}$ and shape embedding $\hat{\boldsymbol{w}}$. The predicted embedding vector $\hat{\boldsymbol{w}}$ is used to retrieve the nearest neighbour CAD model from an embedding space previously learned in a contrastive learning setting with auxiliary tasks.
  • Figure 3: Qualitative visualisation on ScanNet scannetscan2cad. Column 1 shows the reconstruction generated by applying DGrecon to the input video. Column 2 shows the CAD retrieval and alignments predicted by FastCAD when operating on the reconstruction in column 1. Columns 3 and 4 show the input scan from ScanNet scannet and the CAD alignments FastCAD predicts for it. Column 5 shows the ground-truth CAD alignments from Scan2CAD scan2cad.
  • Figure 4: Investigating different thresholds for the instance alignment accuracy on Scan2CAD scan2cad. The translation, rotation and scale thresholds, used to determine whether an alignment is correct, are varied from their default values at 20 cm, 20° and 20%. Note that in each plot, the thresholds that are not investigated remain at their default value. FastCAD outperforms competing methods across all thresholds.
  • Figure 5: CAD retrieval from the learned embedding space. Left: Qualitative visualisation of the retrieved CAD model for a given object in a scene. Note that the input to FastCAD from which a shape embedding $\hat{\boldsymbol{w}}$ is predicted is the scan of the entire scene. However, for clearer visualisation, we only show the cropped part of the scan for which a CAD model is retrieved. Across different object categories, our CAD retrievals are of similar high quality as the ones from the pseudo-labelling method ScanNotate scannotate and the ground-truth CAD models from Scan2CAD scan2cad. Right: Our shape accuracy as a function of the N-th nearest CAD model retrieved from the embedding space. The shape accuracy remains high even as CAD models of increasingly worse rank are retrieved, which is a characteristic of a well-structured embedding space.
  • ...and 5 more figures