SADA: Semantic adversarial unsupervised domain adaptation for Temporal Action Localization
David Pujol-Perich, Albert Clapés, Sergio Escalera
TL;DR
SADA introduces the first unsupervised domain adaptation method tailored for sparse temporal action localization by enforcing semantic, per-class alignment across source and target domains. It couples a multi-resolution, anchor-based TAL backbone with a novel local-class and background-aware adversarial loss, facilitated by pseudo-labels for the unlabeled domain. The approach yields robust cross-domain transfer, outperforming fully supervised baselines and existing UDA methods across seven realistic domain-shift benchmarks derived from EpicKitchens100 and CharadesEgo, with gains up to 6.14% mAP. This work provides a practical, scalable framework for TAL in real-world, domain-heterogeneous video settings and introduces comprehensive benchmarks to evaluate UDA in sparse TAL.
Abstract
Temporal Action Localization (TAL) is a complex task that poses relevant challenges, particularly when attempting to generalize on new -- unseen -- domains in real-world applications. These scenarios, despite realistic, are often neglected in the literature, exposing these solutions to important performance degradation. In this work, we tackle this issue by introducing, for the first time, an approach for Unsupervised Domain Adaptation (UDA) in sparse TAL, which we refer to as Semantic Adversarial unsupervised Domain Adaptation (SADA). Our contributions are threefold: (1) we pioneer the development of a domain adaptation model that operates on realistic sparse action detection benchmarks; (2) we tackle the limitations of global-distribution alignment techniques by introducing a novel adversarial loss that is sensitive to local class distributions, ensuring finer-grained adaptation; and (3) we present a novel set of benchmarks based on EpicKitchens100 and CharadesEgo, that evaluate multiple domain shifts in a comprehensive manner. Our experiments indicate that SADA improves the adaptation across domains when compared to fully supervised state-of-the-art and alternative UDA methods, attaining a performance boost of up to 6.14% mAP.
