Table of Contents
Fetching ...

AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation

Yangchao Wu, Tian Yu Liu, Hyoungseob Park, Stefano Soatto, Dong Lao, Alex Wong

TL;DR

AugUndo addresses the challenge that standard augmentations in unsupervised monocular depth completion and estimation can degrade the training signal due to reconstruction artifacts. It introduces a method that undoes geometric transformations by warping the predicted depth map back to the original reference frame, enabling losses to be computed with original images and sparse depth maps and allowing much larger augmentation spaces. The approach yields consistent improvements on indoor VOID and outdoor KITTI datasets and generalizes to four additional datasets, demonstrating robust gains in unsupervised depth learning. This technique enhances reconstruction-consistent augmentation for depth perception tasks, with publicly available code for replication.

Abstract

Unsupervised depth completion and estimation methods are trained by minimizing reconstruction error. Block artifacts from resampling, intensity saturation, and occlusions are amongst the many undesirable by-products of common data augmentation schemes that affect image reconstruction quality, and thus the training signal. Hence, typical augmentations on images viewed as essential to training pipelines in other vision tasks have seen limited use beyond small image intensity changes and flipping. The sparse depth modality in depth completion have seen even less use as intensity transformations alter the scale of the 3D scene, and geometric transformations may decimate the sparse points during resampling. We propose a method that unlocks a wide range of previously-infeasible geometric augmentations for unsupervised depth completion and estimation. This is achieved by reversing, or ``undo''-ing, geometric transformations to the coordinates of the output depth, warping the depth map back to the original reference frame. This enables computing the reconstruction losses using the original images and sparse depth maps, eliminating the pitfalls of naive loss computation on the augmented inputs and allowing us to scale up augmentations to boost performance. We demonstrate our method on indoor (VOID) and outdoor (KITTI) datasets, where we consistently improve upon recent methods across both datasets as well as generalization to four other datasets. Code available at: https://github.com/alexklwong/augundo.

AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation

TL;DR

AugUndo addresses the challenge that standard augmentations in unsupervised monocular depth completion and estimation can degrade the training signal due to reconstruction artifacts. It introduces a method that undoes geometric transformations by warping the predicted depth map back to the original reference frame, enabling losses to be computed with original images and sparse depth maps and allowing much larger augmentation spaces. The approach yields consistent improvements on indoor VOID and outdoor KITTI datasets and generalizes to four additional datasets, demonstrating robust gains in unsupervised depth learning. This technique enhances reconstruction-consistent augmentation for depth perception tasks, with publicly available code for replication.

Abstract

Unsupervised depth completion and estimation methods are trained by minimizing reconstruction error. Block artifacts from resampling, intensity saturation, and occlusions are amongst the many undesirable by-products of common data augmentation schemes that affect image reconstruction quality, and thus the training signal. Hence, typical augmentations on images viewed as essential to training pipelines in other vision tasks have seen limited use beyond small image intensity changes and flipping. The sparse depth modality in depth completion have seen even less use as intensity transformations alter the scale of the 3D scene, and geometric transformations may decimate the sparse points during resampling. We propose a method that unlocks a wide range of previously-infeasible geometric augmentations for unsupervised depth completion and estimation. This is achieved by reversing, or ``undo''-ing, geometric transformations to the coordinates of the output depth, warping the depth map back to the original reference frame. This enables computing the reconstruction losses using the original images and sparse depth maps, eliminating the pitfalls of naive loss computation on the augmented inputs and allowing us to scale up augmentations to boost performance. We demonstrate our method on indoor (VOID) and outdoor (KITTI) datasets, where we consistently improve upon recent methods across both datasets as well as generalization to four other datasets. Code available at: https://github.com/alexklwong/augundo.
Paper Structure (22 sections, 2 equations, 2 figures, 1 table)

This paper contains 22 sections, 2 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: One kernel at $x_s$ (dotted kernel) or two kernels at $x_i$ and $x_j$ (left and right) lead to the same summed estimate at $x_s$. This shows a figure consisting of different types of lines. Elements of the figure described in the caption should be set in italics, in parentheses, as shown in this sample caption. The last sentence of a figure caption should generally end with a full stop, except when the caption is not a full sentence.
  • Figure 2: Centered, short example caption