Table of Contents
Fetching ...

ConDo: Continual Domain Expansion for Absolute Pose Regression

Zijun Li, Zhipeng Cai, Bochun Yang, Xuelun Shen, Siqi Shen, Xiaoliang Fan, Michael Paulitsch, Cheng Wang

TL;DR

ConDo tackles the brittleness of Absolute Pose Regression (APR) under continual environmental changes by leveraging unlabeled inference data collected after deployment. It distills robust cues from scene-agnostic localization methods to supervise APR updates, while keeping computation bounded and avoiding full re-training. The authors create large-scale benchmarks spanning indoor/outdoor scenes and long-term changes to demonstrate consistent, substantial improvements across architectures and data shifts, with up to 25x faster updates than re-training and significant error reductions on challenging scenes. This approach provides a practical path to life-long visual localization systems that remain accurate as environments evolve.

Abstract

Visual localization is a fundamental machine learning problem. Absolute Pose Regression (APR) trains a scene-dependent model to efficiently map an input image to the camera pose in a pre-defined scene. However, many applications have continually changing environments, where inference data at novel poses or scene conditions (weather, geometry) appear after deployment. Training APR on a fixed dataset leads to overfitting, making it fail catastrophically on challenging novel data. This work proposes Continual Domain Expansion (ConDo), which continually collects unlabeled inference data to update the deployed APR. Instead of applying standard unsupervised domain adaptation methods which are ineffective for APR, ConDo effectively learns from unlabeled data by distilling knowledge from scene-agnostic localization methods. By sampling data uniformly from historical and newly collected data, ConDo can effectively expand the generalization domain of APR. Large-scale benchmarks with various scene types are constructed to evaluate models under practical (long-term) data changes. ConDo consistently and significantly outperforms baselines across architectures, scene types, and data changes. On challenging scenes (Fig.1), it reduces the localization error by >7x (14.8m vs 1.7m). Analysis shows the robustness of ConDo against compute budgets, replay buffer sizes and teacher prediction noise. Comparing to model re-training, ConDo achieves similar performance up to 25x faster.

ConDo: Continual Domain Expansion for Absolute Pose Regression

TL;DR

ConDo tackles the brittleness of Absolute Pose Regression (APR) under continual environmental changes by leveraging unlabeled inference data collected after deployment. It distills robust cues from scene-agnostic localization methods to supervise APR updates, while keeping computation bounded and avoiding full re-training. The authors create large-scale benchmarks spanning indoor/outdoor scenes and long-term changes to demonstrate consistent, substantial improvements across architectures and data shifts, with up to 25x faster updates than re-training and significant error reductions on challenging scenes. This approach provides a practical path to life-long visual localization systems that remain accurate as environments evolve.

Abstract

Visual localization is a fundamental machine learning problem. Absolute Pose Regression (APR) trains a scene-dependent model to efficiently map an input image to the camera pose in a pre-defined scene. However, many applications have continually changing environments, where inference data at novel poses or scene conditions (weather, geometry) appear after deployment. Training APR on a fixed dataset leads to overfitting, making it fail catastrophically on challenging novel data. This work proposes Continual Domain Expansion (ConDo), which continually collects unlabeled inference data to update the deployed APR. Instead of applying standard unsupervised domain adaptation methods which are ineffective for APR, ConDo effectively learns from unlabeled data by distilling knowledge from scene-agnostic localization methods. By sampling data uniformly from historical and newly collected data, ConDo can effectively expand the generalization domain of APR. Large-scale benchmarks with various scene types are constructed to evaluate models under practical (long-term) data changes. ConDo consistently and significantly outperforms baselines across architectures, scene types, and data changes. On challenging scenes (Fig.1), it reduces the localization error by >7x (14.8m vs 1.7m). Analysis shows the robustness of ConDo against compute budgets, replay buffer sizes and teacher prediction noise. Comparing to model re-training, ConDo achieves similar performance up to 25x faster.

Paper Structure

This paper contains 20 sections, 2 equations, 9 figures, 13 tables.

Figures (9)

  • Figure 1: Teaser. We propose Continual Domain Expansion (ConDo) for APR, which utilizes unlabeled data seen during inference to expand the generalization domain of APR. Novel benchmarks are proposed to study practical scenarios where images are captured at novel poses or continually changing environments (left). The x-axis of histograms represents test data from various scans and y-axis indicates the estimated position median error. Trained only on data from spring, the deployed APR cannot handle summer and winter data (top). ConDo updates the model continually with unlabeled inference data and limited computation budgets, effectively expanding the generalization domain over time (bottom).
  • Figure 2: ConDo Pipeline. Left: After the normal APR training on labeled data, the model is deployed to the client. Right: After deployment, the client uploads the unlabeled data to the server. The server continually expands the generalization domain of APR by updating it with the labeled training data $(\mathcal{S}^\Omega, \mathcal{P}^\Omega)$, unlabeled data $\Delta$ and a scene-independent teacher method $f_\text{teacher}$ for knowledge distillation. Limited computation is assigned to each round of model update to ensure practical efficiency.
  • Figure 3: Data split visualization. $\frac{1}{8}$ images in each scan (training and inference) are held out for evaluations. To create challenging evaluation data, We randomly hold several sets of images where each set is a continuous trajectory of the scan consisting of $16$ images. Left: Outdoor Office Loop data. Right: Indoor Chess scene in 7Scenes.
  • Figure 4: Office Loop images. Obvious differences exist between training (Spring Sunny) and inference scans, e.g., over-exposure (Summer Sunny), snow (Winter Snowy) and moving objects (Winter Sunny).
  • Figure 5: Result visualization on Office Loop. We visualize results on training and inference scans, where dark blue points indicate held-out test data and grey-green indicates training/inference data. Due to the space limit, we only visualize one training scan (Train Scan 3), see Appendix \ref{['sec:more-traj']} for other training scans. Train-only performed well on Train Scan 3, but cannot handle unseen scene condition changes (top row). By updating with unlabelled inference data, ConDo not only adapted to inference scans, but also generalized to the training ones (1.87m to 1.22m on the held-out data of Train Scan 3).
  • ...and 4 more figures