Table of Contents
Fetching ...

Neural Unsupervised Domain Adaptation in NLP---A Survey

Alan Ramponi, Barbara Plank

TL;DR

This survey analyzes neural unsupervised domain adaptation in NLP, focusing on learning under domain shift without labeled target data. It presents a tripartite taxonomy—model-centric, data-centric, and hybrid approaches—covering pivots, autoencoders, adversarial losses, reweighting, pseudo-labeling, and advanced pre-training strategies like domain-adaptive pretraining. A key contribution is the variety-space framing of domains, arguing that linguistic variation lies in latent dimensions beyond traditional domain boundaries, and highlighting biases toward sentiment tasks. The paper also calls for standardized, multi-task UDA benchmarks, data-release practices to study diachronic and cross-domain effects, and research on learning under data scarcity and out-of-distribution scenarios to advance robust NLP systems.

Abstract

Deep neural networks excel at learning from labeled data and achieve state-of-the-art resultson a wide array of Natural Language Processing tasks. In contrast, learning from unlabeled data, especially under domain shift, remains a challenge. Motivated by the latest advances, in this survey we review neural unsupervised domain adaptation techniques which do not require labeled target domain data. This is a more challenging yet a more widely applicable setup. We outline methods, from early traditional non-neural methods to pre-trained model transfer. We also revisit the notion of domain, and we uncover a bias in the type of Natural Language Processing tasks which received most attention. Lastly, we outline future directions, particularly the broader need for out-of-distribution generalization of future NLP.

Neural Unsupervised Domain Adaptation in NLP---A Survey

TL;DR

This survey analyzes neural unsupervised domain adaptation in NLP, focusing on learning under domain shift without labeled target data. It presents a tripartite taxonomy—model-centric, data-centric, and hybrid approaches—covering pivots, autoencoders, adversarial losses, reweighting, pseudo-labeling, and advanced pre-training strategies like domain-adaptive pretraining. A key contribution is the variety-space framing of domains, arguing that linguistic variation lies in latent dimensions beyond traditional domain boundaries, and highlighting biases toward sentiment tasks. The paper also calls for standardized, multi-task UDA benchmarks, data-release practices to study diachronic and cross-domain effects, and research on learning under data scarcity and out-of-distribution scenarios to advance robust NLP systems.

Abstract

Deep neural networks excel at learning from labeled data and achieve state-of-the-art resultson a wide array of Natural Language Processing tasks. In contrast, learning from unlabeled data, especially under domain shift, remains a challenge. Motivated by the latest advances, in this survey we review neural unsupervised domain adaptation techniques which do not require labeled target domain data. This is a more challenging yet a more widely applicable setup. We outline methods, from early traditional non-neural methods to pre-trained model transfer. We also revisit the notion of domain, and we uncover a bias in the type of Natural Language Processing tasks which received most attention. Lastly, we outline future directions, particularly the broader need for out-of-distribution generalization of future NLP.

Paper Structure

This paper contains 26 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Taxonomy of DA as special case of transductive transfer learning (left). Related problems (e.g., domain and out-of-distribution generalization) and DA setups (1:1 and multi-source adaptation) (right).