Neural Unsupervised Domain Adaptation in NLP---A Survey
Alan Ramponi, Barbara Plank
TL;DR
This survey analyzes neural unsupervised domain adaptation in NLP, focusing on learning under domain shift without labeled target data. It presents a tripartite taxonomy—model-centric, data-centric, and hybrid approaches—covering pivots, autoencoders, adversarial losses, reweighting, pseudo-labeling, and advanced pre-training strategies like domain-adaptive pretraining. A key contribution is the variety-space framing of domains, arguing that linguistic variation lies in latent dimensions beyond traditional domain boundaries, and highlighting biases toward sentiment tasks. The paper also calls for standardized, multi-task UDA benchmarks, data-release practices to study diachronic and cross-domain effects, and research on learning under data scarcity and out-of-distribution scenarios to advance robust NLP systems.
Abstract
Deep neural networks excel at learning from labeled data and achieve state-of-the-art resultson a wide array of Natural Language Processing tasks. In contrast, learning from unlabeled data, especially under domain shift, remains a challenge. Motivated by the latest advances, in this survey we review neural unsupervised domain adaptation techniques which do not require labeled target domain data. This is a more challenging yet a more widely applicable setup. We outline methods, from early traditional non-neural methods to pre-trained model transfer. We also revisit the notion of domain, and we uncover a bias in the type of Natural Language Processing tasks which received most attention. Lastly, we outline future directions, particularly the broader need for out-of-distribution generalization of future NLP.
