Exploring Few-Shot Adaptation for Activity Recognition on Diverse Domains

Kunyu Peng; Di Wen; David Schneider; Jiaming Zhang; Kailun Yang; M. Saquib Sarfraz; Rainer Stiefelhagen; Alina Roitberg

Exploring Few-Shot Adaptation for Activity Recognition on Diverse Domains

Kunyu Peng, Di Wen, David Schneider, Jiaming Zhang, Kailun Yang, M. Saquib Sarfraz, Rainer Stiefelhagen, Alina Roitberg

TL;DR

This work tackles the problem of Few-Shot Domain Adaptation for Activity Recognition (FSDA-AR) by introducing a diverse five-dataset benchmark that spans cinematic, real-world, egocentric, and synthetic domains. It proposes RelaMiX, a unified FSDA-AR approach comprising a Temporal Relational Attention Network with Relation Dropout (TRAN-RD), a Statistical Distribution-Based Feature Mixture (SDFM), and Cross-Domain Information Alignment (CDIA), to achieve robust temporal generalization, latent-space diversity, and cross-domain alignment using only a few labeled target samples. Empirically, RelaMiX achieves state-of-the-art performance across the FSDA-AR benchmark and is competitive with unsupervised domain adaptation (UDA) methods despite substantially fewer target-domain labels, illustrating high data efficiency in cross-domain HAR. The work provides a solid foundation for future research in data-efficient HAR adaptation, and its code release facilitates reproducibility and broader adoption of FSDA-AR in real-world settings.

Abstract

Domain adaptation is essential for activity recognition to ensure accurate and robust performance across diverse environments, sensor types, and data sources. Unsupervised domain adaptation methods have been extensively studied, yet, they require large-scale unlabeled data from the target domain. In this work, we focus on Few-Shot Domain Adaptation for Activity Recognition (FSDA-AR), which leverages a very small amount of labeled target videos to achieve effective adaptation. This approach is appealing for applications because it only needs a few or even one labeled example per class in the target domain, ideal for recognizing rare but critical activities. However, the existing FSDA-AR works mostly focus on the domain adaptation on sports videos, where the domain diversity is limited. We propose a new FSDA-AR benchmark using five established datasets considering the adaptation on more diverse and challenging domains. Our results demonstrate that FSDA-AR performs comparably to unsupervised domain adaptation with significantly fewer labeled target domain samples. We further propose a novel approach, RelaMiX, to better leverage the few labeled target domain samples as knowledge guidance. RelaMiX encompasses a temporal relational attention network with relation dropout, alongside a cross-domain information alignment mechanism. Furthermore, it integrates a mechanism for mixing features within a latent space by using the few-shot target domain samples. The proposed RelaMiX solution achieves state-of-the-art performance on all datasets within the FSDA-AR benchmark. To encourage future research of few-shot domain adaptation for activity recognition, our code will be publicly available at https://github.com/KPeng9510/RelaMiX.

Exploring Few-Shot Adaptation for Activity Recognition on Diverse Domains

TL;DR

Abstract

Paper Structure (22 sections, 16 equations, 4 figures, 9 tables)

This paper contains 22 sections, 16 equations, 4 figures, 9 tables.

Introduction
Related Work
Method
Problem Formulation
Baselines on FSDA-AR Benchmark
Introduction of RelaMiX Method
Experiments
Datasets
Implementation Details
Analysis of the Benchmark
Ablation Studies
Analysis of Qualitative Results
Conclusion
Social Impact and Limitations
Analysis of the t-SNE Visualization
...and 7 more sections

Figures (4)

Figure 1: (a) Comparison of the Semi-Supervised Domain Adaptation (SSDA), Unsupervised Domain Adaptation (UDA), and Few-Shot Domain Adaptation (FSDA) tasks. (b) An overview of our proposed RelaMiX approach.
Figure 2: The RelaMiX framework processes video by dividing it into overlapping snippets to extract features. It calculates the statistics for these snippets from the source domain, synthesizing cluster centers for the target-domain latent space. Temporal relation sets are refined using Relation-Dropout Multi-Head Self-Attention and Scale-wise Multi-Head Self-Attention for feature learning and aggregation. Cross-Domain Information Alignment (CDIA) loss is used alongside cross-entropy losses to minimize the domain gap.
Figure 3: Qualitative results for FSDA-AR on Shot-$20$ Sims4Action roitberg2021let$\rightarrow$ TSH das2019toyota.
Figure 4: The t-SNE feature visualization van2008visualizing on the UCF test set soomro2012ucf101 for FSDA-AR on $20-$Shot HMDB kuehne2011hmdb$\rightarrow$ UCF soomro2012ucf101.

Exploring Few-Shot Adaptation for Activity Recognition on Diverse Domains

TL;DR

Abstract

Exploring Few-Shot Adaptation for Activity Recognition on Diverse Domains

Authors

TL;DR

Abstract

Table of Contents

Figures (4)