ProtoDepth: Unsupervised Continual Depth Completion with Prototypes
Patrick Rim, Hyoungseob Park, S. Gangopadhyay, Ziyao Zeng, Younjoon Chung, Alex Wong
TL;DR
ProtoDepth addresses unsupervised depth completion under non-stationary data by freezing a pretrained backbone and learning domain-specific prototypes that bias latent features through a global multiplicative term and a local additive bias. The additive bias is constructed via attention over a learned prototype bank, with keys mapped by a projection and a stop-gradient operation, yielding a compact, per-domain adaptation mechanism: $\hat{X} = A \odot X + B$, where $B$ is computed from $Q$, $K$, and $P$ as $b = \text{softmax}(QK^{T}/\sqrt{c})P$ and $K = \text{StopGrad}(P)W$. To handle domain-agnostic inference, the method learns domain descriptors $r_k$ and uses cosine similarity with input descriptors $s$ to select the appropriate prototype set, optimizing an additional term $\ell_{dr}$ that promotes discriminability between domains. Empirically, ProtoDepth and its agnostic variant ProtoDepth-A reduce forgetting by large margins across indoor and outdoor sequences and achieve state-of-the-art performance in unsupervised continual depth completion while adding only a small fraction of parameters, with applicability to both CNNs and transformers. The approach offers a practical, architecture-agnostic solution for continual learning in multimodal 3D reconstruction tasks. $\mathcal{L} = w_{ph}\ell_{ph}+w_{sz}\ell_{sz}+w_{sm}\ell_{sm}$ and $\hat{X} = A \odot X + B$ are central to the method, while $r_k$ and $s_k$ enable domain-aware prototype selection at test time.
Abstract
We present ProtoDepth, a novel prototype-based approach for continual learning of unsupervised depth completion, the multimodal 3D reconstruction task of predicting dense depth maps from RGB images and sparse point clouds. The unsupervised learning paradigm is well-suited for continual learning, as ground truth is not needed. However, when training on new non-stationary distributions, depth completion models will catastrophically forget previously learned information. We address forgetting by learning prototype sets that adapt the latent features of a frozen pretrained model to new domains. Since the original weights are not modified, ProtoDepth does not forget when test-time domain identity is known. To extend ProtoDepth to the challenging setting where the test-time domain identity is withheld, we propose to learn domain descriptors that enable the model to select the appropriate prototype set for inference. We evaluate ProtoDepth on benchmark dataset sequences, where we reduce forgetting compared to baselines by 52.2% for indoor and 53.2% for outdoor to achieve the state of the art.
