DART: A Principled Approach to Adversarially Robust Unsupervised Domain Adaptation

Yunjuan Wang; Hussein Hazimeh; Natalia Ponomareva; Alexey Kurakin; Ibrahim Hammoud; Raman Arora

DART: A Principled Approach to Adversarially Robust Unsupervised Domain Adaptation

Yunjuan Wang, Hussein Hazimeh, Natalia Ponomareva, Alexey Kurakin, Ibrahim Hammoud, Raman Arora

TL;DR

The paper tackles adversarial robustness under unsupervised domain adaptation, formalizing the adversarial target loss and deriving a generalization bound that decomposes risk into source performance, domain divergence, and an ideal joint risk. Building on this theory, it introduces Divergence Aware Adversarial Training (DART), a practical defense that can be integrated with standard UDA methods and supports general threat models, including $\ell_p$ perturbations. The authors also release DomainRobust, a PyTorch testbed with four multi-domain datasets, 46 source-target pairs, and seven meta-algorithms (11 variants), enabling reproducible robustness evaluation. Empirical results show DART achieves significant robustness gains across benchmarks (up to 29.2% improvements) while preserving competitive or improved standard accuracy, highlighting its practical impact for deploying robust UDA systems in real-world settings.

Abstract

Distribution shifts and adversarial examples are two major challenges for deploying machine learning models. While these challenges have been studied individually, their combination is an important topic that remains relatively under-explored. In this work, we study the problem of adversarial robustness under a common setting of distribution shift - unsupervised domain adaptation (UDA). Specifically, given a labeled source domain $D_S$ and an unlabeled target domain $D_T$ with related but different distributions, the goal is to obtain an adversarially robust model for $D_T$. The absence of target domain labels poses a unique challenge, as conventional adversarial robustness defenses cannot be directly applied to $D_T$. To address this challenge, we first establish a generalization bound for the adversarial target loss, which consists of (i) terms related to the loss on the data, and (ii) a measure of worst-case domain divergence. Motivated by this bound, we develop a novel unified defense framework called Divergence Aware adveRsarial Training (DART), which can be used in conjunction with a variety of standard UDA methods; e.g., DANN [Ganin and Lempitsky, 2015]. DART is applicable to general threat models, including the popular $\ell_p$-norm model, and does not require heuristic regularizers or architectural changes. We also release DomainRobust: a testbed for evaluating robustness of UDA models to adversarial attacks. DomainRobust consists of 4 multi-domain benchmark datasets (with 46 source-target pairs) and 7 meta-algorithms with a total of 11 variants. Our large-scale experiments demonstrate that on average, DART significantly enhances model robustness on all benchmarks compared to the state of the art, while maintaining competitive standard accuracy. The relative improvement in robustness from DART reaches up to 29.2% on the source-target domain pairs considered.

DART: A Principled Approach to Adversarially Robust Unsupervised Domain Adaptation

TL;DR

perturbations. The authors also release DomainRobust, a PyTorch testbed with four multi-domain datasets, 46 source-target pairs, and seven meta-algorithms (11 variants), enabling reproducible robustness evaluation. Empirical results show DART achieves significant robustness gains across benchmarks (up to 29.2% improvements) while preserving competitive or improved standard accuracy, highlighting its practical impact for deploying robust UDA systems in real-world settings.

Abstract

and an unlabeled target domain

with related but different distributions, the goal is to obtain an adversarially robust model for

. The absence of target domain labels poses a unique challenge, as conventional adversarial robustness defenses cannot be directly applied to

. To address this challenge, we first establish a generalization bound for the adversarial target loss, which consists of (i) terms related to the loss on the data, and (ii) a measure of worst-case domain divergence. Motivated by this bound, we develop a novel unified defense framework called Divergence Aware adveRsarial Training (DART), which can be used in conjunction with a variety of standard UDA methods; e.g., DANN [Ganin and Lempitsky, 2015]. DART is applicable to general threat models, including the popular

-norm model, and does not require heuristic regularizers or architectural changes. We also release DomainRobust: a testbed for evaluating robustness of UDA models to adversarial attacks. DomainRobust consists of 4 multi-domain benchmark datasets (with 46 source-target pairs) and 7 meta-algorithms with a total of 11 variants. Our large-scale experiments demonstrate that on average, DART significantly enhances model robustness on all benchmarks compared to the state of the art, while maintaining competitive standard accuracy. The relative improvement in robustness from DART reaches up to 29.2% on the source-target domain pairs considered.

Paper Structure (44 sections, 3 theorems, 25 equations, 2 figures, 23 tables, 1 algorithm)

This paper contains 44 sections, 3 theorems, 25 equations, 2 figures, 23 tables, 1 algorithm.

Introduction
Related Work
UDA.
Adversarial Robustness.
Adversarial Robustness in UDA.
Problem Setup and Preliminaries
Standard UDA Theory
Adversarially Robust UDA Theory
Divergence Aware Adversarial Training: a practical defense
Pseudo-Labels $\mathbf{\hat{Y}_T}$.
Source Choices $\mathbf{\tilde{\mathcal{Z}}_S}$.
Empirical Evaluation
DomainRobust: A PyTorch Testbed for UDA under Adversarial Attacks
Datasets.
Algorithms.
...and 29 more sections

Key Result

Theorem 2.1

Given a hypothesis class $\mathcal{H}$, the following holds: where $\gamma(\mathcal{D}_S, \mathcal{D}_T):=\min_{h^*\in \mathcal{H}} [ L^{0/1}(h^*;\mathcal{D}_S) + L^{0/1}(h^*;\mathcal{D}_T) ]$ is the joint loss of an ideal classifier that works well on both domains.

Figures (2)

Figure 1: Robust accuracy as a function of perturbation size for different algorithms on PACS (Photo$\to$Sketch).
Figure 2: Robust accuracy as a function of attack iterations for different algorithms on PACS ( Photo$\to$Sketch).

Theorems & Definitions (5)

Definition 2.1: $\mathcal{H}\Delta\mathcal{H}$-divergence ben2010theory
Theorem 2.1: ben2010theory
Theorem 3.1
Theorem A.1
proof : Proof of Theorem \ref{['thm:main']}

DART: A Principled Approach to Adversarially Robust Unsupervised Domain Adaptation

TL;DR

Abstract

DART: A Principled Approach to Adversarially Robust Unsupervised Domain Adaptation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (5)