Table of Contents
Fetching ...

Feed-Forward Latent Domain Adaptation

Ondrej Bohdal, Da Li, Shell Xu Hu, Timothy Hospedales

TL;DR

This work introduces Feed-Forward Latent Domain Adaptation (CXDA), a practical framework for adapting a pre-trained model to deployment data comprising multiple latent domains without access to source data and without back-propagation. By meta-learning a cross-attention module that jointly processes the query and a set of unlabeled support examples, CXDA selectively leverages relevant instances to adapt inference on a per-example basis in a streaming, feed-forward manner. Experiments on FEMNIST, CIFAR-C, TinyImageNet-C, and iWildCam show CXDA consistently outperforms strong ERM baselines and many back-propagation methods, with notable robustness to domain mixture and real-time constraints. The results suggest that automated instance selection via cross-attention can surpass risks associated with manual domain labeling, offering practical benefits for edge devices facing real-world domain shifts.

Abstract

We study a new highly-practical problem setting that enables resource-constrained edge devices to adapt a pre-trained model to their local data distributions. Recognizing that device's data are likely to come from multiple latent domains that include a mixture of unlabelled domain-relevant and domain-irrelevant examples, we focus on the comparatively under-studied problem of latent domain adaptation. Considering limitations of edge devices, we aim to only use a pre-trained model and adapt it in a feed-forward way, without using back-propagation and without access to the source data. Modelling these realistic constraints bring us to the novel and practically important problem setting of feed-forward latent domain adaptation. Our solution is to meta-learn a network capable of embedding the mixed-relevance target dataset and dynamically adapting inference for target examples using cross-attention. The resulting framework leads to consistent improvements over strong ERM baselines. We also show that our framework sometimes even improves on the upper bound of domain-supervised adaptation, where only domain-relevant instances are provided for adaptation. This suggests that human annotated domain labels may not always be optimal, and raises the possibility of doing better through automated instance selection.

Feed-Forward Latent Domain Adaptation

TL;DR

This work introduces Feed-Forward Latent Domain Adaptation (CXDA), a practical framework for adapting a pre-trained model to deployment data comprising multiple latent domains without access to source data and without back-propagation. By meta-learning a cross-attention module that jointly processes the query and a set of unlabeled support examples, CXDA selectively leverages relevant instances to adapt inference on a per-example basis in a streaming, feed-forward manner. Experiments on FEMNIST, CIFAR-C, TinyImageNet-C, and iWildCam show CXDA consistently outperforms strong ERM baselines and many back-propagation methods, with notable robustness to domain mixture and real-time constraints. The results suggest that automated instance selection via cross-attention can surpass risks associated with manual domain labeling, offering practical benefits for edge devices facing real-world domain shifts.

Abstract

We study a new highly-practical problem setting that enables resource-constrained edge devices to adapt a pre-trained model to their local data distributions. Recognizing that device's data are likely to come from multiple latent domains that include a mixture of unlabelled domain-relevant and domain-irrelevant examples, we focus on the comparatively under-studied problem of latent domain adaptation. Considering limitations of edge devices, we aim to only use a pre-trained model and adapt it in a feed-forward way, without using back-propagation and without access to the source data. Modelling these realistic constraints bring us to the novel and practically important problem setting of feed-forward latent domain adaptation. Our solution is to meta-learn a network capable of embedding the mixed-relevance target dataset and dynamically adapting inference for target examples using cross-attention. The resulting framework leads to consistent improvements over strong ERM baselines. We also show that our framework sometimes even improves on the upper bound of domain-supervised adaptation, where only domain-relevant instances are provided for adaptation. This suggests that human annotated domain labels may not always be optimal, and raises the possibility of doing better through automated instance selection.
Paper Structure (23 sections, 5 equations, 5 figures, 8 tables, 2 algorithms)

This paper contains 23 sections, 5 equations, 5 figures, 8 tables, 2 algorithms.

Figures (5)

  • Figure 1: Illustration of standard and latent domain adaptation (LDA) settings. In the LDA setting (support) images come from a variety of domains of mixed and unknown relevance to the test (query) image. In standard DA adaptation images are all assumed to be equally relevant.
  • Figure 2: Illustration of the desired application scenario where a pre-trained model is deployed to many edge devices. Each device utilizes its own data coming from several domains to quickly adapt the model for the current test image.
  • Figure 3: Analysis of test accuracy (%) vs time per task (ms) for the various approaches evaluated. CXDA achieves the best performance, has similar speed to other feed-forward baselines and is faster than fine-tuning approaches that use back-propagation (1 and 10 adaptation steps are shown for FT-EM and FT-IM). The difference is especially large when the fine-tuning approaches use 10 fine-tuning steps, but even if only 1 step is used there is a visible speed difference. Time per task includes adapting to the task and making a prediction.
  • Figure 4: Density histograms of attention weights for pairs of same and different domain examples in the test tasks of iWildCam.
  • Figure 5: Analysis of attention weights for an example task in iWildCam, with a query image coming from location (camera trap) #288. We show the five support examples in each domain that have the largest and smallest attention weights. Similar images from the same location (#288) are given the largest weights, but also relevant images from other locations (e.g. #125) are given larger weights. The examples with the smallest attention weights visually do not seem relevant.