Enhancement of 3D Camera Synthetic Training Data with Noise Models

Katarína Osvaldová; Lukáš Gajdošech; Viktor Kocur; Martin Madaras

Enhancement of 3D Camera Synthetic Training Data with Noise Models

Katarína Osvaldová, Lukáš Gajdošech, Viktor Kocur, Martin Madaras

TL;DR

This paper addresses the domain gap between synthetic and real 3D camera data by modeling two principal noise components—lateral in-image-plane noise and axial depth noise—and estimating their dependence on object distance and surface angle from a custom dataset. The authors fit quadratic noise models for each device and use these to augment synthetic training data for a UNet-based object segmentation task, showing that training with slightly higher noise than estimated ($M_n \approx 1.25$) yields the best real-world generalization. They validate the approach on real Armadillo scans across multiple distances and demonstrate that both under- and over-noising can harm performance, emphasizing the value of device-specific noise models for realistic synthetic data generation. The work provides practical noise-modeling tools and a data-sharing setup that can improve the robustness of depth-based neural networks in real-world applications while outlining avenues for extending noise types and representations.

Abstract

The goal of this paper is to assess the impact of noise in 3D camera-captured data by modeling the noise of the imaging process and applying it on synthetic training data. We compiled a dataset of specifically constructed scenes to obtain a noise model. We specifically model lateral noise, affecting the position of captured points in the image plane, and axial noise, affecting the position along the axis perpendicular to the image plane. The estimated models can be used to emulate noise in synthetic training data. The added benefit of adding artificial noise is evaluated in an experiment with rendered data for object segmentation. We train a series of neural networks with varying levels of noise in the data and measure their ability to generalize on real data. The results show that using too little or too much noise can hurt the networks' performance indicating that obtaining a model of noise from real scanners is beneficial for synthetic data generation.

Enhancement of 3D Camera Synthetic Training Data with Noise Models

TL;DR

) yields the best real-world generalization. They validate the approach on real Armadillo scans across multiple distances and demonstrate that both under- and over-noising can harm performance, emphasizing the value of device-specific noise models for realistic synthetic data generation. The work provides practical noise-modeling tools and a data-sharing setup that can improve the robustness of depth-based neural networks in real-world applications while outlining avenues for extending noise types and representations.

Abstract

Paper Structure (21 sections, 2 equations, 11 figures, 1 table)

This paper contains 21 sections, 2 equations, 11 figures, 1 table.

Introduction
Related Work
Structured Light Scanning
Time-of-Flight Scanning
Sources of Noise and Errors in 3D Scanning
Training NNs using Synthetic Data
Estimating 3D Camera Noise Parameters
Lateral and Axial Noise
Lateral noise
Axial noise
Custom Dataset
Lateral Noise Estimation
Axial Noise Estimation
Noise Models
Enhancement of Synthetic Training Data with Emulated Noise
...and 6 more sections

Figures (11)

Figure 1: (a) Cropped range image of a white paper (blue rectangular area) positioned 1.25 m away from the camera at a 20° angle captured by Kinect v1. White pixels represent missing values. Lateral noise can be seen at the paper boundaries which are straight in the real scene. (b) Cropped range image of a planar wall captured by Kinect v2 at 90 cm distance with notable axial noise.
Figure 2: Physical setup for capturing surface at different angles and distances.
Figure 3: Range images captured by devices. The scene contains a white paper at 1 m distance and 30° angle captured as portrayed in \ref{['fig:anglesSetup']}.
Figure 4: Normalised histograms of lateral error values. Collected from 200 images by Kinect v1 and Kinect v2, and 100 images by MotionCam-3D. Each histogram represents a scene containing the white paper at 0° angle. The distances differ for each camera, being the shortest at which the paper was captured completely; 1m for Kinect v1, 0.75 m for Kinect v2, 0.5 m for MotionCam-3D. Each histogram contains fitted normal distribution (dashed line).
Figure 5: Visualisation of the relationship between the standard deviation of lateral noise, measured in mm, surface angle (left column), and distance (right column). Each row contains data from a different device. The plots in each column and row share the x and y axes respectively. In the plots of the left column, the underlying angle values are all multiples of 10. A random shift of horizontal position between frames was added for legibility.
...and 6 more figures

Enhancement of 3D Camera Synthetic Training Data with Noise Models

TL;DR

Abstract

Enhancement of 3D Camera Synthetic Training Data with Noise Models

Authors

TL;DR

Abstract

Table of Contents

Figures (11)