DecTrain: Deciding When to Train a Monocular Depth DNN Online

Zih-Sing Fu; Soumya Sudhakar; Sertac Karaman; Vivienne Sze

DecTrain: Deciding When to Train a Monocular Depth DNN Online

Zih-Sing Fu, Soumya Sudhakar, Sertac Karaman, Vivienne Sze

TL;DR

DecTrain tackles the problem of monocular depth estimation degrading under deployment distribution shift by learning when to perform online self-supervised updates. It introduces a lightweight decision DNN to predict the utility of training at each timestep and a greedy policy to train only when the predicted gain justifies the cost, balancing accuracy and computation. Empirical results show DecTrain nearly matches the accuracy of training at all timesteps while reducing compute (e.g., ~27% on indoor sequences), and it enables competitive performance for low-cost DNNs that beat higher-cost models in total cost and sometimes accuracy. The approach leverages uncertainty-aware margin and ability signals, an MDP framing for metareasoning, and selective online updates to deliver practical, energy-efficient online adaptation for robotic perception.

Abstract

Deep neural networks (DNNs) can deteriorate in accuracy when deployment data differs from training data. While performing online training at all timesteps can improve accuracy, it is computationally expensive. We propose DecTrain, a new algorithm that decides when to train a monocular depth DNN online using self-supervision with low overhead. To make the decision at each timestep, DecTrain compares the cost of training with the predicted accuracy gain. We evaluate DecTrain on out-of-distribution data, and find DecTrain maintains accuracy compared to online training at all timesteps, while training only 44% of the time on average. We also compare the recovery of a low inference cost DNN using DecTrain and a more generalizable high inference cost DNN on various sequences. DecTrain recovers the majority (97%) of the accuracy gain of online training at all timesteps while reducing computation compared to the high inference cost DNN which recovers only 66%. With an even smaller DNN, we achieve 89% recovery while reducing computation by 56%. DecTrain enables low-cost online training for a smaller DNN to have competitive accuracy with a larger, more generalizable DNN at a lower overall computational cost.

DecTrain: Deciding When to Train a Monocular Depth DNN Online

TL;DR

Abstract

Paper Structure (23 sections, 9 equations, 9 figures, 3 tables)

This paper contains 23 sections, 9 equations, 9 figures, 3 tables.

Introduction
Related Work
Domain adaptation
Sample selection during training
DNN uncertainty estimation
Markov decision processes for modeling decision-making
Problem Definition
DecTrain: Deciding When to Train
Learning to predict the utility of training
Inputs relevant to margin to improve
Inputs relevant to ability to improve
Training decision DNN
Greedy decision-making
Experimental Setup
Pretraining monocular depth DNN and decision DNN
...and 8 more sections

Figures (9)

Figure 1: DecTrain (red) decides when to perform online training based on margin to improve (visualized by the gap between the blue and black lines) and ability to improve (visualized by the texture and sharpness in the image). Compared to the baseline of online training at all timesteps (black) or no timesteps (blue), DecTrain maintains the accuracy improvement of adaptation while training on only a subset of the timesteps (dashed lines denote new sequence).
Figure 2: DecTrain overview: at each timestep, the decision DNN takes inputs relevant to the margin and ability to improve to predict the utility of training, which is compared to the cost of training to decide when to train the monocular depth DNN.
Figure 3: DecTrain lowers computation by 27% vs. online training at all timesteps. Error bars are one standard deviation.
Figure 4: Examples of RGB input, ground-truth depth, depth prediction, and error for baselines and DecTrain from Exp. 5. Compared to no online training, there is lower error in the online training at all timesteps and DecTrain (see white boxed regions). Online training improves the depth scale (e.g., walls and floor in second row) and the prediction for unknown objects (e.g., painting in third row), and DecTrain mimics the improved performance at lower computational cost. While the off-the-shelf self-supervised loss vodisch2023codeps improves the scale, it also introduces artifacts that are visually less smooth. These artifacts are also present when using online training at all timesteps due to the loss function, not DecTrain.
Figure 5: Histograms of training decisions on experiments from ScanNet and Sun3D. DecTrain and the greedy oracle reduce the amount of training vs. online training at all timesteps, and DecTrain closely follows the greedy oracle. Note, online training can only run when a SLAM pose is available.
...and 4 more figures

DecTrain: Deciding When to Train a Monocular Depth DNN Online

TL;DR

Abstract

DecTrain: Deciding When to Train a Monocular Depth DNN Online

Authors

TL;DR

Abstract

Table of Contents

Figures (9)