Exploring Few-Shot Adaptation for Activity Recognition on Diverse Domains
Kunyu Peng, Di Wen, David Schneider, Jiaming Zhang, Kailun Yang, M. Saquib Sarfraz, Rainer Stiefelhagen, Alina Roitberg
TL;DR
This work tackles the problem of Few-Shot Domain Adaptation for Activity Recognition (FSDA-AR) by introducing a diverse five-dataset benchmark that spans cinematic, real-world, egocentric, and synthetic domains. It proposes RelaMiX, a unified FSDA-AR approach comprising a Temporal Relational Attention Network with Relation Dropout (TRAN-RD), a Statistical Distribution-Based Feature Mixture (SDFM), and Cross-Domain Information Alignment (CDIA), to achieve robust temporal generalization, latent-space diversity, and cross-domain alignment using only a few labeled target samples. Empirically, RelaMiX achieves state-of-the-art performance across the FSDA-AR benchmark and is competitive with unsupervised domain adaptation (UDA) methods despite substantially fewer target-domain labels, illustrating high data efficiency in cross-domain HAR. The work provides a solid foundation for future research in data-efficient HAR adaptation, and its code release facilitates reproducibility and broader adoption of FSDA-AR in real-world settings.
Abstract
Domain adaptation is essential for activity recognition to ensure accurate and robust performance across diverse environments, sensor types, and data sources. Unsupervised domain adaptation methods have been extensively studied, yet, they require large-scale unlabeled data from the target domain. In this work, we focus on Few-Shot Domain Adaptation for Activity Recognition (FSDA-AR), which leverages a very small amount of labeled target videos to achieve effective adaptation. This approach is appealing for applications because it only needs a few or even one labeled example per class in the target domain, ideal for recognizing rare but critical activities. However, the existing FSDA-AR works mostly focus on the domain adaptation on sports videos, where the domain diversity is limited. We propose a new FSDA-AR benchmark using five established datasets considering the adaptation on more diverse and challenging domains. Our results demonstrate that FSDA-AR performs comparably to unsupervised domain adaptation with significantly fewer labeled target domain samples. We further propose a novel approach, RelaMiX, to better leverage the few labeled target domain samples as knowledge guidance. RelaMiX encompasses a temporal relational attention network with relation dropout, alongside a cross-domain information alignment mechanism. Furthermore, it integrates a mechanism for mixing features within a latent space by using the few-shot target domain samples. The proposed RelaMiX solution achieves state-of-the-art performance on all datasets within the FSDA-AR benchmark. To encourage future research of few-shot domain adaptation for activity recognition, our code will be publicly available at https://github.com/KPeng9510/RelaMiX.
