Table of Contents
Fetching ...

Supersampling of Data from Structured-light Scanner with Deep Learning

Martin Melicherčík, Lukáš Gajdošech, Viktor Kocur, Martin Madaras

TL;DR

The paper tackles the high computational cost of processing high-resolution depth maps from structured-light cameras by down-sampling the depth map, applying DL-based up-sampling, and up-sampling back to high resolution. It adapts two state-of-the-art depth-map super-resolution models, DKN and FDSR, to a custom Photoneo MotionCam-3D dataset using targeted data-preparation steps, including hole filling and texture augmentation, and introduces an object-focused loss to emphasize the scanned object. The authors show that FDSR delivers significant speed advantages, while DKN yields higher precision, with both outperforming simple nearest-neighbor up-sampling in both depth-map and point-cloud metrics. The approach demonstrates practical gains in processing pipelines by performing costly steps at low resolution and then up-sampling, offering a path to faster real-time or near-real-time 3D data processing on accessible hardware.

Abstract

This paper focuses on increasing the resolution of depth maps obtained from 3D cameras using structured light technology. Two deep learning models FDSR and DKN are modified to work with high-resolution data, and data pre-processing techniques are implemented for stable training. The models are trained on our custom dataset of 1200 3D scans. The resulting high-resolution depth maps are evaluated using qualitative and quantitative metrics. The approach for depth map upsampling offers benefits such as reducing the processing time of a pipeline by first downsampling a high-resolution depth map, performing various processing steps at the lower resolution and upsampling the resulting depth map or increasing the resolution of a point cloud captured in lower resolution by a cheaper device. The experiments demonstrate that the FDSR model excels in terms of faster processing time, making it a suitable choice for applications where speed is crucial. On the other hand, the DKN model provides results with higher precision, making it more suitable for applications that prioritize accuracy.

Supersampling of Data from Structured-light Scanner with Deep Learning

TL;DR

The paper tackles the high computational cost of processing high-resolution depth maps from structured-light cameras by down-sampling the depth map, applying DL-based up-sampling, and up-sampling back to high resolution. It adapts two state-of-the-art depth-map super-resolution models, DKN and FDSR, to a custom Photoneo MotionCam-3D dataset using targeted data-preparation steps, including hole filling and texture augmentation, and introduces an object-focused loss to emphasize the scanned object. The authors show that FDSR delivers significant speed advantages, while DKN yields higher precision, with both outperforming simple nearest-neighbor up-sampling in both depth-map and point-cloud metrics. The approach demonstrates practical gains in processing pipelines by performing costly steps at low resolution and then up-sampling, offering a path to faster real-time or near-real-time 3D data processing on accessible hardware.

Abstract

This paper focuses on increasing the resolution of depth maps obtained from 3D cameras using structured light technology. Two deep learning models FDSR and DKN are modified to work with high-resolution data, and data pre-processing techniques are implemented for stable training. The models are trained on our custom dataset of 1200 3D scans. The resulting high-resolution depth maps are evaluated using qualitative and quantitative metrics. The approach for depth map upsampling offers benefits such as reducing the processing time of a pipeline by first downsampling a high-resolution depth map, performing various processing steps at the lower resolution and upsampling the resulting depth map or increasing the resolution of a point cloud captured in lower resolution by a cheaper device. The experiments demonstrate that the FDSR model excels in terms of faster processing time, making it a suitable choice for applications where speed is crucial. On the other hand, the DKN model provides results with higher precision, making it more suitable for applications that prioritize accuracy.
Paper Structure (21 sections, 7 figures, 3 tables)

This paper contains 21 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Samples from Photoneo MotionCam-3D. The depth map (left) is visualized in RGB and mapped linearly to the color scale from green to blue. White pixels are undefined. The intensity map is visualized on the right.
  • Figure 2: Point clouds from input HR depth map (left) and down-sampled LR depth map (right) with scaling factor $s = 4$.
  • Figure 3: From the input (a), we generate the hole map (b) and label distinct holes with unique numbers (c). Background holes are filled with a single value (d). For the remaining holes, we determine the defined outer border pixels and find the maximum for each row (e) (g). Finally, we fill these holes row-wise (f) (h) and arrive at a fully defined depth map (i).
  • Figure 4: Point cloud from filled depth map, defined points are in red, green points are filled by our proposed procedure.
  • Figure 5: (a) Yellow points represent the maximal-depth group of pixels of the grid, pink points represent the near-mean group of pixels of the grid, and the red points represent the chosen points to be vertices of the triangle determining scene ground plane. Figure (b) shows the output object map.
  • ...and 2 more figures