Table of Contents
Fetching ...

CoDEPS: Online Continual Learning for Depth Estimation and Panoptic Segmentation

Niclas Vödisch, Kürsat Petek, Wolfram Burgard, Abhinav Valada

TL;DR

This work addresses the task of continual learning for deep learning-based monocular depth estimation and panoptic segmentation in new environments in an online manner by introducing CoDEPS to perform continual learning involving multiple real-world domains while mitigating catastrophic forgetting by leveraging experience replay.

Abstract

Operating a robot in the open world requires a high level of robustness with respect to previously unseen environments. Optimally, the robot is able to adapt by itself to new conditions without human supervision, e.g., automatically adjusting its perception system to changing lighting conditions. In this work, we address the task of continual learning for deep learning-based monocular depth estimation and panoptic segmentation in new environments in an online manner. We introduce CoDEPS to perform continual learning involving multiple real-world domains while mitigating catastrophic forgetting by leveraging experience replay. In particular, we propose a novel domain-mixing strategy to generate pseudo-labels to adapt panoptic segmentation. Furthermore, we explicitly address the limited storage capacity of robotic systems by leveraging sampling strategies for constructing a fixed-size replay buffer based on rare semantic class sampling and image diversity. We perform extensive evaluations of CoDEPS on various real-world datasets demonstrating that it successfully adapts to unseen environments without sacrificing performance on previous domains while achieving state-of-the-art results. The code of our work is publicly available at http://codeps.cs.uni-freiburg.de.

CoDEPS: Online Continual Learning for Depth Estimation and Panoptic Segmentation

TL;DR

This work addresses the task of continual learning for deep learning-based monocular depth estimation and panoptic segmentation in new environments in an online manner by introducing CoDEPS to perform continual learning involving multiple real-world domains while mitigating catastrophic forgetting by leveraging experience replay.

Abstract

Operating a robot in the open world requires a high level of robustness with respect to previously unseen environments. Optimally, the robot is able to adapt by itself to new conditions without human supervision, e.g., automatically adjusting its perception system to changing lighting conditions. In this work, we address the task of continual learning for deep learning-based monocular depth estimation and panoptic segmentation in new environments in an online manner. We introduce CoDEPS to perform continual learning involving multiple real-world domains while mitigating catastrophic forgetting by leveraging experience replay. In particular, we propose a novel domain-mixing strategy to generate pseudo-labels to adapt panoptic segmentation. Furthermore, we explicitly address the limited storage capacity of robotic systems by leveraging sampling strategies for constructing a fixed-size replay buffer based on rare semantic class sampling and image diversity. We perform extensive evaluations of CoDEPS on various real-world datasets demonstrating that it successfully adapts to unseen environments without sacrificing performance on previous domains while achieving state-of-the-art results. The code of our work is publicly available at http://codeps.cs.uni-freiburg.de.
Paper Structure (12 sections, 10 equations, 5 figures, 7 tables)

This paper contains 12 sections, 10 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Neural networks often perform poorly when deployed on a target domain that differs from the source domain used for training. To close this domain gap, we propose to continuously adapt the network by exploiting online target images. To mitigate catastrophic forgetting and enhance generalizability, we leverage a fixed-size replay buffer allowing the method to revisit data from both the source and target domains.
  • Figure 2: Overview of our proposed CoDEPS. Unlabeled RGB images from an online camera stream are combined with samples from a replay buffer comprising both annotated source samples and previously seen target images. Cross-domain mixing enables pseudo-supervision on the target domain. The network weights are then updated via backpropagation using the constructed data batch. The additional PoseNet required for unsupervised monocular depth estimation is omitted in this visualization.
  • Figure 3: Our proposed cross-domain mixing strategy first transfers the image style from the target to the source sample. Then it augments the target image to match the appearance of the source camera. Finally, a random image patch is copied from the target to the source image. The source annotations are retained and completed by the network's estimate on the copied image patch. The result serves as pseudo-label, combining self-iterative learning with ground truth supervision.
  • Figure 4: Qualitative results for Cityscapes to KITTI-360 adaptation after pretraining on the source, i.e., 0 steps, and after having seen 1,000 and 2,500 frames. As shown in the left column, CoDEPS is able to avoid catastrophic forgetting on the source domain. The progressive adaptation on the target domain is particularly visible in the image areas highlighted by yellow boxes. "Stuff" classes of similar appearance like sidewalk vs. road (left image) and terrain vs. vegetation (right image) can be better distinguished by CoDEPS. Furthermore, instances become more pronounced as can be observed for the highlighted car (left image) and the cyclist (right image).
  • Figure 5: Evolution of performance metrics on SemKITTI-DVPS sequence 08 during adaptation (protocol 1). The metrics are averaged until the given frame number. The target domains $\mathcal{T}_1$ and $\mathcal{T}_2$ refer to SemKITTI-DVPS and KITTI-360, respectively. It can be seen that there is positive forward transfer when first adapting on $\mathcal{T}_2$.